Payment card fraud - including both credit cards and debit cards - is forecast to grow by over ** billion U.S. dollars between 2022 and 2028. Especially outside the United States, the amount of fraudulent payments almost doubled from 2014 to 2021. In total, fraudulent card payments reached ** billion U.S. dollars in 2021. Card fraud losses across the world increased by more than ** percent between 2020 and 2021, the largest increase since 2018.
Card fraud losses across the world increased by more than ** percent between 2020 and 2021, the largest increase since 2018. It was estimated that merchants and card acquirers lost well over ** billion U.S. dollars, with - so the source adds - roughly ** billion U.S. dollar coming from the United States alone. Note that the figures provided here included both credit card fraud and debit card fraud. The source does not separate between the two, and also did not provide figures on the United States - a country known for its reliance on credit cards.
The dataset is generated using the Faker library to simulate transaction data. It contains several columns that represent both user and transaction information, including features for detecting fraudulent activities. The data includes a mix of categorical, numerical, and datetime values, which need to be processed for machine learning.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:
Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.
Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.
Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.
Method of Dataset Preparation
Schema validation: Renamed columns to snake_case (e.g. transaction_amount
, is_declined
) so they conform to DBRepo’s requirements.
Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).
Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr
. Each subset was materialized in DBRepo and assigned its own PID for precise citation.
Cleaning: Converted the categorical flags (is_declined
, isforeigntransaction
, ishighriskcountry
, isfradulent
) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr
, merchant_id
).
Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.
Dataset Structure
The raw data is a single CSV with columns:
actionnr
(integer transaction ID)
merchant_id
(string)
average_amount_transaction_day
(float)
transaction_amount
(float)
is_declined
, isforeigntransaction
, ishighriskcountry
, isfradulent
(binary flags)
total_number_of_declines_day
, daily_chargeback_avg_amt
, sixmonth_avg_chbk_amt
, sixmonth_chbk_freq
(numeric features)
Naming Conventions
All columns use lowercase snake_case.
Subsets are named creditcard_training
, creditcard_validation
, creditcard_test
in DBRepo.
Files in the code repo follow a clear structure:
├── data/ # local copies only; raw data lives in DBRepo
├── notebooks/Task.ipynb
├── models/rf_model_v1.joblib
├── outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv
├── README.md
├── requirements.txt
└── codemeta.json
Required Software
Python 3.9+
pandas, numpy (data handling)
scikit-learn (modeling, metrics)
matplotlib (visualizations)
dbrepo‐client.py (DBRepo API)
requests (TU WRD API)
Additional Resources
Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud
Scikit-learn docs: https://scikit-learn.org/stable
DBRepo API guide: via the starter notebook’s dbrepo_client.py
template
TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs
Data Limitations
Highly imbalanced: only ~0.17% of transactions are fraudulent.
Anonymized PCA features (V1
–V28
) hidden; we extended with domain features but cannot reverse engineer raw variables.
Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.
Licensing and Attribution
Raw data: CC-0 (per Kaggle terms)
Code & notebooks: MIT License
Model artifacts & outputs: CC-BY 4.0
DUWRD records include ORCID identifiers for the author.
Recommended Uses
Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.
Educational purposes: demonstrating model‐training pipelines, FAIR data practices.
Extension: adding time‐series or deep‐learning models.
Known Issues
Possible temporal leakage if date/time features not handled correctly.
Model performance may degrade on live data due to concept drift.
Binary flags may oversimplify nuanced transaction outcomes.
This dataset was created by Dileep
U.S. consumers reported about ***million U.S. dollars worth of credit card fraud in the first quarter of 2025, the second increase in a row. This is according to a reporting of the organization that collects such consumer reports submitted to local law enforcement. While credit cards are relatively popular in the United States, the highest value type of fraud is reported with bank transfers or cryptocurrencies. The latter is relatively surprising, as the global size of crypto fraud is reported to be much lower than hacks involving cryptocurrency.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global credit card fraud detection platform market is experiencing robust growth, driven by the increasing prevalence of digital transactions and the sophistication of fraudulent activities. The market, estimated at $15 billion in 2025, is projected to maintain a healthy Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $45 billion by 2033. This expansion is fueled by several key factors: the rising adoption of e-commerce and mobile payments, the increasing volume of online transactions, the growing need for robust security measures among businesses to protect customer data and prevent financial losses, and the continuous evolution of fraud techniques necessitating advanced detection capabilities. Furthermore, the increasing regulatory scrutiny and compliance requirements are pushing organizations to invest heavily in sophisticated fraud detection systems. The market is segmented by deployment (cloud-based and on-premise), by organization size (small, medium, and large enterprises), and by industry vertical (banking, financial services, and insurance, retail, healthcare, and others). Key players in this dynamic market include established companies like Kount, ClearSale, Stripe Radar, Riskified, and FICO, alongside emerging technology providers like Akkio and Dataiku. These companies are constantly innovating to improve detection accuracy, reduce false positives, and offer seamless integration with existing payment processing systems. While challenges remain, such as the rising complexity of fraud schemes and the need to balance security with user experience, the market is poised for continued strong growth, driven by technological advancements in machine learning, artificial intelligence, and big data analytics. The increasing adoption of real-time fraud detection and advanced analytics capabilities will further shape the market landscape in the coming years, creating opportunities for both established and emerging players.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
As a data contributor, I'm sharing this crucial dataset focused on the detection of fraudulent credit card transactions. Recognizing these illicit activities is paramount for protecting customers and the integrity of financial systems.
About the Dataset:
This dataset encompasses credit card transactions made by European cardholders during a two-day period in September 2013. It presents a real-world scenario with a significant class imbalance, where fraudulent transactions are considerably less frequent than legitimate ones. Out of a total of 284,807 transactions, only 492 are instances of fraud, representing a mere 0.172% of the entire dataset.
Content of the Data:
Due to confidentiality concerns, the majority of the input features in this dataset have undergone a Principal Component Analysis (PCA) transformation. This means the original meaning and context of features V1, V2, ..., V28 are not directly provided. However, these principal components capture the variance in the underlying transaction data.
The only features that have not been transformed by PCA are:
The target variable for this classification task is:
Important Note on Evaluation:
Given the substantial class imbalance (far more legitimate transactions than fraudulent ones), traditional accuracy metrics based on the confusion matrix can be misleading. It is strongly recommended to evaluate models using the Area Under the Precision-Recall Curve (AUPRC), as this metric is more sensitive to the performance on the minority class (fraudulent transactions).
How to Use This Dataset:
Acknowledgements and Citation:
This dataset has been collected and analyzed through a research collaboration between Worldline and the Machine Learning Group (MLG) of ULB (Université Libre de Bruxelles).
When using this dataset in your research or projects, please cite the following works as appropriate:
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 284,807 transactions from a credit card company, where 492 transactions are fraudulent. The data is highly imbalanced, with only a small fraction of transactions being fraudulent. The dataset is commonly used to build and evaluate fraud detection models.
The dataset has been split into training and testing sets and saved in the following files: - X_train.csv: Feature data for the training set - X_test.csv: Feature data for the testing set - y_train.csv: Labels for the training set (fraudulent or legitimate) - y_test.csv: Labels for the testing set
This updated dataset is ready to be used for training and evaluating machine learning models, specifically designed for credit card fraud detection tasks.
This description highlights the key aspects of the dataset, including its preprocessing steps and the availability of the processed files for ease of use.
In 2022, the state of Bihar in India had the highest number of credit and debit card frauds, with approximately *** cases registered with the authorities. The country recorded over *** thousand cases of credit and debit card frauds that year. This category of crime came under the purview of Sections *** of the Indian Penal Code.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset has been released by [1], which had been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of Université Libre de Bruxelles (ULB) on big data mining and fraud detection. [1] Pozzolo, A. D., Caelan, O., Johnson, R. A., and Bontempi, G. (2015). Calibrating Probability with Undersampling for Unbalanced Classification. 2015 IEEE Symposium Series on Computational, pp. 159-166, doi: 10.1109/SSCI.2015.33 open source kaggle : https://www.kaggle.com/mlg-ulb/creditcardfraud
This graph illustrates the distribution of the domestic fraud rate of bank card transactions in France between 2011 and 2018, by type of payment. In 2018, the fraud rate for bank withdrawals in France amounted to 0.02 percent.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Credit Card Transactions Dataset includes more than 20 million credit card transactions over the decades of 2,000 U.S. resident consumers created by IBM's simulations, providing details of each transaction and fraudulent labels.
2) Data Utilization (1) Credit Card Transactions Dataset has characteristics that: • This dataset provides a variety of properties that are similar to real credit card transactions, including transaction amount, time, card information, purchase location, and store category (MCC). (2) Credit Card Transactions Dataset can be used to: • Development of Credit Card Fraud Detection Model: Using transaction history and properties, you can build a fraud (abnormal transaction) detection model based on machine learning. • Analysis of consumption patterns and risks: Long-term and diverse transaction data can be used to analyze customer consumption behavior and identify risk factors.
The U.S. payment industry began migrating to EMV chip-card technology in the mid-2010s to mitigate card-present fraud, especially counterfeit fraud. However, for non-prepaid debit card transactions processed by dual-message networks, the counterfeit fraud rate has not declined, and the lost-or-stolen fraud rate and overall card-present fraud rate have increased. For these transactions, card-present fraud loss rates have declined for issuers but increased for merchants and cardholders.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global credit card generator market is projected to experience robust growth with a market size of approximately USD 580 million in 2023, and it is anticipated to reach USD 1.2 billion by 2032, exhibiting a compound annual growth rate (CAGR) of 8.5%. The rising need for secure and efficient credit card testing tools, driven by the expansion of e-commerce and digital transactions, forms a significant growth catalyst for this market. As online retail and digital financial services burgeon, the demand for reliable credit card generators continues to escalate, underscoring the importance of this market segment.
One of the pivotal growth drivers for the credit card generator market is the increasing complexity and sophistication of online payment systems. As e-commerce platforms and digital payment solutions proliferate worldwide, there is a growing need for comprehensive testing tools to ensure the reliability and security of these systems. Credit card generators play a crucial role in this context by providing developers and testers with the means to simulate various credit card scenarios, thereby enhancing the robustness of payment processing systems. Additionally, the rise in cyber threats and fraud necessitates stringent testing, further propelling market growth.
Another significant factor contributing to the market's expansion is the growing emphasis on fraud prevention and security. Financial institutions and businesses are increasingly investing in sophisticated tools to combat fraud and secure financial transactions. Credit card generators offer a practical solution for testing the efficacy of anti-fraud measures and ensuring that security protocols are adequately robust. By enabling the simulation of fraudulent activities and various transaction scenarios, these tools help organizations better prepare for and mitigate potential security breaches.
Furthermore, the marketing and promotional applications of credit card generators are also driving market growth. Companies leveraging digital marketing strategies use these tools to create dummy credit card numbers for various promotional activities, such as offering free trials or discounts, without exposing real customer data. This capability not only aids in marketing efforts but also ensures compliance with data privacy regulations, thereby enhancing consumer trust and brand reputation. The versatility of credit card generators in supporting both operational and marketing functions underscores their growing importance in the digital age.
Regionally, North America holds a significant share of the credit card generator market, driven by the high penetration of digital payment systems and advanced cybersecurity measures in the region. The presence of numerous financial institutions and technology companies further bolsters the market in North America. Meanwhile, Asia Pacific is expected to witness the fastest growth, fueled by the rapid digitalization of economies, increasing internet penetration, and burgeoning e-commerce activities. Europe also presents substantial opportunities due to stringent data protection regulations and the widespread adoption of digital transaction systems.
The credit card generator market can be segmented by type into software and online services. Software-based credit card generators are widely used by developers and testers within organizations to simulate credit card transactions and validate payment processing systems. These tools are typically integrated into the development and testing environments, providing a controlled and secure platform for generating valid credit card numbers. The demand for software-based generators is driven by their ability to offer customizable options and advanced features, such as bulk generation and API integration, which enhance the efficiency of testing processes.
Online services, on the other hand, cater to a broader audience, including individual users, small businesses, and marketers. These services are accessible via web platforms and provide an easy-to-use interface for generating credit card numbers for various purposes, such as testing, fraud prevention, and marketing promotions. The growing popularity of online credit card generators can be attributed to their convenience, accessibility, and the increasing need for temporary and disposable credit card numbers in the digital economy. These services are particularly useful for busin
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.
The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.
Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.
A simulator for transaction data has been released as part of the practical handbook on Machine Learning for Credit Card Fraud Detection - https://fraud-detection-handbook.github.io/fraud-detection-handbook/Chapter_3_GettingStarted/SimulatedDataset.html. We invite all practitioners interested in fraud detection datasets to also check out this data simulator, and the methodologies for credit card fraud detection presented in the book.
The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project
Please cite the following works:
Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015
Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon
Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE
Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)
Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier
Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing
Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019
Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019
Yann-Aël Le Borgne, Gianluca Bontempi Reproducible machine Learning for Credit Card Fraud Detection - Practical Handbook
Bertrand Lebichot, Gianmarco Paldino, Wissam Siblini, Liyun He, Frederic Oblé, Gianluca Bontempi Incremental learning strategies for credit cards fraud detection, IInternational Journal of Data Science and Analytics
This dataset was created by morris njoroge
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed credit card transaction records enriched with fraud suspicion flags, risk scores, and contextual information such as merchant, location, and transaction method. It is ideal for developing, training, and evaluating fraud detection models, as well as for analyzing transaction patterns and identifying emerging fraud tactics in the financial sector.
Credit card fraud identification is an important issue in risk prevention and control for banks and financial institutions. In order to establish an efficient credit card fraud identification model, this article studied the relevant factors that affect fraud identification. A credit card fraud identification model based on neural networks was constructed, and in-depth discussions and research were conducted. First, the layers of neural networks were deepened to improve the prediction accuracy of the model; second, this paper increase the hidden layer width of the neural network to improve the prediction accuracy of the model. This article proposes a new fusion neural network model by combining deep neural networks and wide neural networks, and applies the model to credit card fraud identification. The characteristic of this model is that the accuracy of prediction and F1 score are relatively high. Finally, use the random gradient descent method to train the model. On the test set, the proposed method has an accuracy of 96.44% and an F1 value of 96.17%, demonstrating good fraud recognition performance. After comparison, the method proposed in this paper is superior to machine learning models, ensemble learning models, and deep learning models.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains simulated credit card transaction records, including detailed information on transaction amounts, merchant details, geolocation, device usage, and fraud labels. It is designed for training and evaluating fraud detection models, supporting the identification of both typical and anomalous transaction patterns. The dataset is ideal for fintech AI development, security analytics, and research into payment fraud behaviors.
Payment card fraud - including both credit cards and debit cards - is forecast to grow by over ** billion U.S. dollars between 2022 and 2028. Especially outside the United States, the amount of fraudulent payments almost doubled from 2014 to 2021. In total, fraudulent card payments reached ** billion U.S. dollars in 2021. Card fraud losses across the world increased by more than ** percent between 2020 and 2021, the largest increase since 2018.