Payment card fraud - including both credit cards and debit cards - is forecast to grow by over ** billion U.S. dollars between 2022 and 2028. Especially outside the United States, the amount of fraudulent payments almost doubled from 2014 to 2021. In total, fraudulent card payments reached ** billion U.S. dollars in 2021. Card fraud losses across the world increased by more than ** percent between 2020 and 2021, the largest increase since 2018.
Card fraud losses across the world increased by more than ** percent between 2020 and 2021, the largest increase since 2018. It was estimated that merchants and card acquirers lost well over ** billion U.S. dollars, with - so the source adds - roughly ** billion U.S. dollar coming from the United States alone. Note that the figures provided here included both credit card fraud and debit card fraud. The source does not separate between the two, and also did not provide figures on the United States - a country known for its reliance on credit cards.
The dataset is generated using the Faker library to simulate transaction data. It contains several columns that represent both user and transaction information, including features for detecting fraudulent activities. The data includes a mix of categorical, numerical, and datetime values, which need to be processed for machine learning.
This dataset was created by Dileep
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:
Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.
Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.
Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.
Method of Dataset Preparation
Schema validation: Renamed columns to snake_case (e.g. transaction_amount
, is_declined
) so they conform to DBRepo’s requirements.
Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).
Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr
. Each subset was materialized in DBRepo and assigned its own PID for precise citation.
Cleaning: Converted the categorical flags (is_declined
, isforeigntransaction
, ishighriskcountry
, isfradulent
) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr
, merchant_id
).
Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.
Dataset Structure
The raw data is a single CSV with columns:
actionnr
(integer transaction ID)
merchant_id
(string)
average_amount_transaction_day
(float)
transaction_amount
(float)
is_declined
, isforeigntransaction
, ishighriskcountry
, isfradulent
(binary flags)
total_number_of_declines_day
, daily_chargeback_avg_amt
, sixmonth_avg_chbk_amt
, sixmonth_chbk_freq
(numeric features)
Naming Conventions
All columns use lowercase snake_case.
Subsets are named creditcard_training
, creditcard_validation
, creditcard_test
in DBRepo.
Files in the code repo follow a clear structure:
├── data/ # local copies only; raw data lives in DBRepo
├── notebooks/Task.ipynb
├── models/rf_model_v1.joblib
├── outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv
├── README.md
├── requirements.txt
└── codemeta.json
Required Software
Python 3.9+
pandas, numpy (data handling)
scikit-learn (modeling, metrics)
matplotlib (visualizations)
dbrepo‐client.py (DBRepo API)
requests (TU WRD API)
Additional Resources
Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud
Scikit-learn docs: https://scikit-learn.org/stable
DBRepo API guide: via the starter notebook’s dbrepo_client.py
template
TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs
Data Limitations
Highly imbalanced: only ~0.17% of transactions are fraudulent.
Anonymized PCA features (V1
–V28
) hidden; we extended with domain features but cannot reverse engineer raw variables.
Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.
Licensing and Attribution
Raw data: CC-0 (per Kaggle terms)
Code & notebooks: MIT License
Model artifacts & outputs: CC-BY 4.0
DUWRD records include ORCID identifiers for the author.
Recommended Uses
Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.
Educational purposes: demonstrating model‐training pipelines, FAIR data practices.
Extension: adding time‐series or deep‐learning models.
Known Issues
Possible temporal leakage if date/time features not handled correctly.
Model performance may degrade on live data due to concept drift.
Binary flags may oversimplify nuanced transaction outcomes.
In 2022, about ******* complaints filed with the Federal Trade Commission (FTC) were due to credit card fraud in the United States. An additional ****** complaints were filed with the FTC due to government documents/benefits fraud.
A September 2023 survey of American adults found that the most frequently experienced type of financial cybercrime was credit card fraud, reported by roughly 64 percent of respondents. The breach of financial data was ranked second, followed by account hacking.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global credit card fraud detection platform market is experiencing robust growth, driven by the increasing prevalence of digital transactions and the sophistication of fraudulent activities. The market, estimated at $15 billion in 2025, is projected to maintain a healthy Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $45 billion by 2033. This expansion is fueled by several key factors: the rising adoption of e-commerce and mobile payments, the increasing volume of online transactions, the growing need for robust security measures among businesses to protect customer data and prevent financial losses, and the continuous evolution of fraud techniques necessitating advanced detection capabilities. Furthermore, the increasing regulatory scrutiny and compliance requirements are pushing organizations to invest heavily in sophisticated fraud detection systems. The market is segmented by deployment (cloud-based and on-premise), by organization size (small, medium, and large enterprises), and by industry vertical (banking, financial services, and insurance, retail, healthcare, and others). Key players in this dynamic market include established companies like Kount, ClearSale, Stripe Radar, Riskified, and FICO, alongside emerging technology providers like Akkio and Dataiku. These companies are constantly innovating to improve detection accuracy, reduce false positives, and offer seamless integration with existing payment processing systems. While challenges remain, such as the rising complexity of fraud schemes and the need to balance security with user experience, the market is poised for continued strong growth, driven by technological advancements in machine learning, artificial intelligence, and big data analytics. The increasing adoption of real-time fraud detection and advanced analytics capabilities will further shape the market landscape in the coming years, creating opportunities for both established and emerging players.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 284,807 transactions from a credit card company, where 492 transactions are fraudulent. The data is highly imbalanced, with only a small fraction of transactions being fraudulent. The dataset is commonly used to build and evaluate fraud detection models.
The dataset has been split into training and testing sets and saved in the following files: - X_train.csv: Feature data for the training set - X_test.csv: Feature data for the testing set - y_train.csv: Labels for the training set (fraudulent or legitimate) - y_test.csv: Labels for the testing set
This updated dataset is ready to be used for training and evaluating machine learning models, specifically designed for credit card fraud detection tasks.
This description highlights the key aspects of the dataset, including its preprocessing steps and the availability of the processed files for ease of use.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global credit card generator market is projected to experience robust growth with a market size of approximately USD 580 million in 2023, and it is anticipated to reach USD 1.2 billion by 2032, exhibiting a compound annual growth rate (CAGR) of 8.5%. The rising need for secure and efficient credit card testing tools, driven by the expansion of e-commerce and digital transactions, forms a significant growth catalyst for this market. As online retail and digital financial services burgeon, the demand for reliable credit card generators continues to escalate, underscoring the importance of this market segment.
One of the pivotal growth drivers for the credit card generator market is the increasing complexity and sophistication of online payment systems. As e-commerce platforms and digital payment solutions proliferate worldwide, there is a growing need for comprehensive testing tools to ensure the reliability and security of these systems. Credit card generators play a crucial role in this context by providing developers and testers with the means to simulate various credit card scenarios, thereby enhancing the robustness of payment processing systems. Additionally, the rise in cyber threats and fraud necessitates stringent testing, further propelling market growth.
Another significant factor contributing to the market's expansion is the growing emphasis on fraud prevention and security. Financial institutions and businesses are increasingly investing in sophisticated tools to combat fraud and secure financial transactions. Credit card generators offer a practical solution for testing the efficacy of anti-fraud measures and ensuring that security protocols are adequately robust. By enabling the simulation of fraudulent activities and various transaction scenarios, these tools help organizations better prepare for and mitigate potential security breaches.
Furthermore, the marketing and promotional applications of credit card generators are also driving market growth. Companies leveraging digital marketing strategies use these tools to create dummy credit card numbers for various promotional activities, such as offering free trials or discounts, without exposing real customer data. This capability not only aids in marketing efforts but also ensures compliance with data privacy regulations, thereby enhancing consumer trust and brand reputation. The versatility of credit card generators in supporting both operational and marketing functions underscores their growing importance in the digital age.
Regionally, North America holds a significant share of the credit card generator market, driven by the high penetration of digital payment systems and advanced cybersecurity measures in the region. The presence of numerous financial institutions and technology companies further bolsters the market in North America. Meanwhile, Asia Pacific is expected to witness the fastest growth, fueled by the rapid digitalization of economies, increasing internet penetration, and burgeoning e-commerce activities. Europe also presents substantial opportunities due to stringent data protection regulations and the widespread adoption of digital transaction systems.
The credit card generator market can be segmented by type into software and online services. Software-based credit card generators are widely used by developers and testers within organizations to simulate credit card transactions and validate payment processing systems. These tools are typically integrated into the development and testing environments, providing a controlled and secure platform for generating valid credit card numbers. The demand for software-based generators is driven by their ability to offer customizable options and advanced features, such as bulk generation and API integration, which enhance the efficiency of testing processes.
Online services, on the other hand, cater to a broader audience, including individual users, small businesses, and marketers. These services are accessible via web platforms and provide an easy-to-use interface for generating credit card numbers for various purposes, such as testing, fraud prevention, and marketing promotions. The growing popularity of online credit card generators can be attributed to their convenience, accessibility, and the increasing need for temporary and disposable credit card numbers in the digital economy. These services are particularly useful for busin
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset has been released by [1], which had been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of Université Libre de Bruxelles (ULB) on big data mining and fraud detection. [1] Pozzolo, A. D., Caelan, O., Johnson, R. A., and Bontempi, G. (2015). Calibrating Probability with Undersampling for Unbalanced Classification. 2015 IEEE Symposium Series on Computational, pp. 159-166, doi: 10.1109/SSCI.2015.33 open source kaggle : https://www.kaggle.com/mlg-ulb/creditcardfraud
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Credit Card Transactions Dataset includes more than 20 million credit card transactions over the decades of 2,000 U.S. resident consumers created by IBM's simulations, providing details of each transaction and fraudulent labels.
2) Data Utilization (1) Credit Card Transactions Dataset has characteristics that: • This dataset provides a variety of properties that are similar to real credit card transactions, including transaction amount, time, card information, purchase location, and store category (MCC). (2) Credit Card Transactions Dataset can be used to: • Development of Credit Card Fraud Detection Model: Using transaction history and properties, you can build a fraud (abnormal transaction) detection model based on machine learning. • Analysis of consumption patterns and risks: Long-term and diverse transaction data can be used to analyze customer consumption behavior and identify risk factors.
In the late 1970s, the Rand Corporation pioneered a method of collecting crime rate statistics. They obtained reports of offending behavior--types and frequencies of crimes committed--directly from offenders serving prison sentences. The current study extends this research by exploring the extent to which variation in the methodological approach affects prisoners' self-reports of criminal activity. If the crime rates reported in this survey remained constant across methods, perhaps one of the new techniques developed would be easier and/or less expensive to administer. Also, the self-reported offending rate data for female offenders in this collection represents the first time such data has been collected for females. Male and female prisoners recently admitted to the Diagnostic Unit of the Colorado Department of Corrections were selected for participation in the study. Prisoners were given one of two different survey instruments, referred to as the long form and short form. Both questionnaires dealt with the number of times respondents committed each of eight types of crimes during a 12-month measurement period. The crimes of interest were burglary, robbery, assault, theft, auto theft, forgery/credit card and check-writing crimes, fraud, and drug dealing. The long form of the instrument focused on juvenile and adult criminal activity and covered the offender's childhood and family. It also contained questions about the offender's rap sheet as one of the bases for validating the self-reported data. The crime count sections of the long form contained questions about motivation, initiative, whether the offender usually acted alone or with others, and if the crimes recorded included crimes against people he or she knew. Long-form data are given in Part 1. The short form of the survey had fewer or no questions compared with the long form on areas such as the respondent's rap sheet, the number of crimes committed as a juvenile, the number of times the respondent was on probation or parole, the respondent's childhood experiences, and the respondent's perception of his criminal career. These data are contained in Part 2. In addition, the surveys were administered under different conditions of confidentiality. Prisoners given what were called "confidential" interviews had their names identified with the survey. Those interviewed under conditions of anonymity did not have their names associated with the survey. The short forms were all administered anonymously, while the long forms were either anonymous or confidential. In addition to the surveys, data were collected from official records, which are presented in Part 3. The official record data collection form was designed to collect detailed criminal history information, particularly during the measurement period identified in the questionnaires, plus a number of demographic and drug-use items. This information, when compared with the self-reported offense data from the measurement period in both the short and long forms, allows a validity analysis to be performed.
This statistic presents the value of losses due to synthetic credit card fraud in the United States from 2015 to 2017, with projections extending to 2020. Such fraud led to *** million U.S. dollars in damages in 2017, an amount which was expected to increase to nearly **** trillion U.S. dollars in 2020.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
As a data contributor, I'm sharing this crucial dataset focused on the detection of fraudulent credit card transactions. Recognizing these illicit activities is paramount for protecting customers and the integrity of financial systems.
About the Dataset:
This dataset encompasses credit card transactions made by European cardholders during a two-day period in September 2013. It presents a real-world scenario with a significant class imbalance, where fraudulent transactions are considerably less frequent than legitimate ones. Out of a total of 284,807 transactions, only 492 are instances of fraud, representing a mere 0.172% of the entire dataset.
Content of the Data:
Due to confidentiality concerns, the majority of the input features in this dataset have undergone a Principal Component Analysis (PCA) transformation. This means the original meaning and context of features V1, V2, ..., V28 are not directly provided. However, these principal components capture the variance in the underlying transaction data.
The only features that have not been transformed by PCA are:
The target variable for this classification task is:
Important Note on Evaluation:
Given the substantial class imbalance (far more legitimate transactions than fraudulent ones), traditional accuracy metrics based on the confusion matrix can be misleading. It is strongly recommended to evaluate models using the Area Under the Precision-Recall Curve (AUPRC), as this metric is more sensitive to the performance on the minority class (fraudulent transactions).
How to Use This Dataset:
Acknowledgements and Citation:
This dataset has been collected and analyzed through a research collaboration between Worldline and the Machine Learning Group (MLG) of ULB (Université Libre de Bruxelles).
When using this dataset in your research or projects, please cite the following works as appropriate:
Credit card fraud identification is an important issue in risk prevention and control for banks and financial institutions. In order to establish an efficient credit card fraud identification model, this article studied the relevant factors that affect fraud identification. A credit card fraud identification model based on neural networks was constructed, and in-depth discussions and research were conducted. First, the layers of neural networks were deepened to improve the prediction accuracy of the model; second, this paper increase the hidden layer width of the neural network to improve the prediction accuracy of the model. This article proposes a new fusion neural network model by combining deep neural networks and wide neural networks, and applies the model to credit card fraud identification. The characteristic of this model is that the accuracy of prediction and F1 score are relatively high. Finally, use the random gradient descent method to train the model. On the test set, the proposed method has an accuracy of 96.44% and an F1 value of 96.17%, demonstrating good fraud recognition performance. After comparison, the method proposed in this paper is superior to machine learning models, ensemble learning models, and deep learning models.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed credit card transaction records enriched with fraud suspicion flags, risk scores, and contextual information such as merchant, location, and transaction method. It is ideal for developing, training, and evaluating fraud detection models, as well as for analyzing transaction patterns and identifying emerging fraud tactics in the financial sector.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Malaysia Consumers: Security: Credit Card/Debit Card/Bank Fraud data was reported at 63.900 % in 2018. Malaysia Consumers: Security: Credit Card/Debit Card/Bank Fraud data is updated yearly, averaging 63.900 % from Dec 2018 (Median) to 2018, with 1 observations. Malaysia Consumers: Security: Credit Card/Debit Card/Bank Fraud data remains active status in CEIC and is reported by Malaysian Communications and Multimedia Commission. The data is categorized under Global Database’s Malaysia – Table MY.S026: E-Commerce Consumer Survey.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed records of credit card fraud alerts, including suspicious transaction details, merchant information, alert status, and investigator actions. It enables financial institutions to detect, investigate, and respond to fraudulent activities efficiently, supporting enterprise risk management and loss mitigation.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains simulated credit card transaction records, including detailed information on transaction amounts, merchant details, geolocation, device usage, and fraud labels. It is designed for training and evaluating fraud detection models, supporting the identification of both typical and anomalous transaction patterns. The dataset is ideal for fintech AI development, security analytics, and research into payment fraud behaviors.
Payment card fraud - including both credit cards and debit cards - is forecast to grow by over ** billion U.S. dollars between 2022 and 2028. Especially outside the United States, the amount of fraudulent payments almost doubled from 2014 to 2021. In total, fraudulent card payments reached ** billion U.S. dollars in 2021. Card fraud losses across the world increased by more than ** percent between 2020 and 2021, the largest increase since 2018.