Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:
Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.
Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.
Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.
Method of Dataset Preparation
Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.
Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).
Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.
Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).
Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.
Dataset Structure
The raw data is a single CSV with columns:
actionnr (integer transaction ID)
merchant_id (string)
average_amount_transaction_day (float)
transaction_amount (float)
is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)
total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)
Naming Conventions
All columns use lowercase snake_case.
Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.
Files in the code repo follow a clear structure:
├── data/ # local copies only; raw data lives in DBRepo
├── notebooks/Task.ipynb
├── models/rf_model_v1.joblib
├── outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv
├── README.md
├── requirements.txt
└── codemeta.json
Required Software
Python 3.9+
pandas, numpy (data handling)
scikit-learn (modeling, metrics)
matplotlib (visualizations)
dbrepo‐client.py (DBRepo API)
requests (TU WRD API)
Additional Resources
Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud
Scikit-learn docs: https://scikit-learn.org/stable
DBRepo API guide: via the starter notebook’s dbrepo_client.py template
TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs
Data Limitations
Highly imbalanced: only ~0.17% of transactions are fraudulent.
Anonymized PCA features (V1–V28) hidden; we extended with domain features but cannot reverse engineer raw variables.
Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.
Licensing and Attribution
Raw data: CC-0 (per Kaggle terms)
Code & notebooks: MIT License
Model artifacts & outputs: CC-BY 4.0
DUWRD records include ORCID identifiers for the author.
Recommended Uses
Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.
Educational purposes: demonstrating model‐training pipelines, FAIR data practices.
Extension: adding time‐series or deep‐learning models.
Known Issues
Possible temporal leakage if date/time features not handled correctly.
Model performance may degrade on live data due to concept drift.
Binary flags may oversimplify nuanced transaction outcomes.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
kgauvin603/creditcard-fraud-detection dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
🔍 Dataset Description: Credit Card Fraud Detection This dataset is designed for building and evaluating machine learning models for credit card fraud detection. It contains anonymized transaction records where the goal is to classify transactions as fraudulent (1) or non-fraudulent (0) based on several features.
📁 Dataset Overview: Each row represents a single credit card transaction.
Features include a mix of numerical and transformed variables (e.g., V1 to V28) derived from PCA for confidentiality.
The Amount and Hour_of_Day features represent the transaction value and time, respectively.
The Class column is the target variable:
0 → Legitimate transaction
1 → Fraudulent transaction
✅ Key Highlights: The dataset contains both classes (0 and 1) to ensure balanced evaluation for binary classification.
Suitable for testing anomaly detection, binary classification, and imbalanced dataset handling techniques like SMOTE or under-sampling.
Ideal for learners, researchers, and practitioners working on fraud detection in real-world scenarios.
🧠 Suggested Use Cases: Model evaluation with metrics like precision, recall, F1-score (due to class imbalance).
Experimentation with algorithms such as Logistic Regression, Random Forest, XGBoost, and Neural Networks.
Feature engineering and explainability techniques (e.g., SHAP values).
Facebook
TwitterThe dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. The dataset is 0.15 GB large.
The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection.
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global credit card fraud detection platform market is experiencing robust growth, driven by the escalating volume of digital transactions and the increasing sophistication of fraudulent activities. While precise figures for market size and CAGR are not provided, based on industry reports and observed trends, a reasonable estimation places the 2025 market size at approximately $15 billion. Considering the rapid adoption of advanced technologies like AI and machine learning in fraud detection, a conservative Compound Annual Growth Rate (CAGR) of 15% is projected for the forecast period (2025-2033). This growth is fueled by several factors, including the rising prevalence of e-commerce, the expanding adoption of mobile payments, and the increasing demand for robust security solutions from both personal and enterprise users. The market is segmented by screening type (manual and automatic) and application (personal and enterprise), with the automatic screening and enterprise segments expected to witness faster growth due to their efficiency and scalability. The competitive landscape is highly dynamic, with a mix of established players like Visa, Mastercard, and FICO, alongside innovative technology companies like Kount, Riskified, and Feedzai. These companies are continuously developing and deploying advanced algorithms and analytics to stay ahead of evolving fraud techniques. Regional growth varies, with North America and Europe currently holding significant market share, though Asia-Pacific is projected to exhibit rapid expansion due to increasing internet penetration and e-commerce adoption in developing economies. Challenges to market growth include the high cost of implementation and maintenance of these platforms, along with the need for continuous updates to counter evolving fraud tactics. However, the increasing financial losses incurred due to fraud are incentivizing businesses and consumers to invest in more sophisticated fraud detection solutions, thereby sustaining the market's upward trajectory.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison with other credit card fraud detection dataset.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global credit card fraud detection platform market is projected to reach a value of USD 10.3 billion by 2033, exhibiting a CAGR of 12.5% during the forecast period (2023-2033). The increasing adoption of digital payment methods, the growing number of online transactions, and the rising incidences of fraudulent activities are driving the market's growth. However, factors such as data privacy concerns and the high cost of implementing these platforms may hinder the market's growth. The market is segmented based on application, type, and region. The application segments include e-commerce, banking and financial institutions, and other applications. The e-commerce segment holds the largest market share due to the increasing popularity of online shopping. The types segment includes rule-based systems, statistical techniques, machine learning algorithms, and others. Machine learning algorithms are expected to witness the highest growth rate during the forecast period due to their ability to learn from data and identify fraudulent patterns accurately. The regional segments include North America, Europe, Asia Pacific, and the Rest of the World. North America currently dominates the market and is expected to maintain its dominance throughout the forecast period.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Credit Card Transactions Dataset includes more than 20 million credit card transactions over the decades of 2,000 U.S. resident consumers created by IBM's simulations, providing details of each transaction and fraudulent labels.
2) Data Utilization (1) Credit Card Transactions Dataset has characteristics that: • This dataset provides a variety of properties that are similar to real credit card transactions, including transaction amount, time, card information, purchase location, and store category (MCC). (2) Credit Card Transactions Dataset can be used to: • Development of Credit Card Fraud Detection Model: Using transaction history and properties, you can build a fraud (abnormal transaction) detection model based on machine learning. • Analysis of consumption patterns and risks: Long-term and diverse transaction data can be used to analyze customer consumption behavior and identify risk factors.
Facebook
TwitterThis dataset was created by Sriseshagiri
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset has been released by [1], which had been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of Université Libre de Bruxelles (ULB) on big data mining and fraud detection. [1] Pozzolo, A. D., Caelan, O., Johnson, R. A., and Bontempi, G. (2015). Calibrating Probability with Undersampling for Unbalanced Classification. 2015 IEEE Symposium Series on Computational, pp. 159-166, doi: 10.1109/SSCI.2015.33 open source kaggle : https://www.kaggle.com/mlg-ulb/creditcardfraud
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
The context of this dataset is to find fraudulent credit cards by analyzing the features. The detection of fraudulent credit card can be done using ML or DL.
The data actually collected from Weka Repository: https://weka.8497.n7.nabble.com/file/n23121/credit_fruad.arff
Facebook
TwitterThis dataset was created by AMAN GOEL
Facebook
Twitterhttps://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
Market Research Intellect's Credit Card Fraud Detection Platform Market Report highlights a valuation of USD 3.5 billion in 2024 and anticipates growth to USD 8.2 billion by 2033, with a CAGR of 10.5% from 2026-2033.Explore insights on demand dynamics, innovation pipelines, and competitive landscapes.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed, labeled records of simulated credit card transactions, including transaction amounts, merchant and cardholder information, and fraud indicators. It is ideal for developing and benchmarking machine learning models aimed at detecting fraudulent activity and reducing financial risk in payment systems. The inclusion of transaction context and cardholder demographics supports advanced analytics and feature engineering.
Facebook
Twitterhttps://www.polarismarketresearch.com/privacy-policyhttps://www.polarismarketresearch.com/privacy-policy
The global Credit Card Fraud Detection Platform Market size was valued at USD 3.59 billion in 2024 and is expected to grow at a CAGR of 15.3% from 2025 to 2034.
Facebook
TwitterThis dataset was created by Manish Kumar
Facebook
TwitterCard fraud losses across the world increased by more than ** percent between 2020 and 2021, the largest increase since 2018. It was estimated that merchants and card acquirers lost well over ** billion U.S. dollars, with - so the source adds - roughly ** billion U.S. dollar coming from the United States alone. Note that the figures provided here included both credit card fraud and debit card fraud. The source does not separate between the two, and also did not provide figures on the United States - a country known for its reliance on credit cards.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains simulated credit card transaction records, including detailed information on transaction amounts, merchant details, geolocation, device usage, and fraud labels. It is designed for training and evaluating fraud detection models, supporting the identification of both typical and anomalous transaction patterns. The dataset is ideal for fintech AI development, security analytics, and research into payment fraud behaviors.
Facebook
TwitterWe provide you with a data set in CSV format. The data set contains 2 lakhh+ record train instances and 56 thousand test instance There are 31 input features, labeled V1 to V28 and Amount .
The target variable is labeled Class.
Create a Classification model to predict the target variable Class.
Facebook
Twitterhttps://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Credit Card Fraud Detection Platform Market size was valued at USD 3.4 Billion in 2024 and is projected to reach USD 12.44 Billion by 2032, growing at a CAGR of 17.6% during the forecast period 2026 to 2032.Global Credit Card Fraud Detection Platform Market Drivers:The market drivers for the credit card fraud detection platform market can be influenced by various factors. These may include:Rising Incidence of Online Payment Fraud: The increasing number of fraud attempts during online transactions pushes financial institutions to adopt platforms that monitor and detect unauthorized credit card activity in real time.Growth in E-Commerce Transactions: With more consumers shopping online, the volume of card-not-present transactions rises, creating higher exposure to fraud and driving demand for detection platforms to secure digital payments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:
Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.
Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.
Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.
Method of Dataset Preparation
Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.
Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).
Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.
Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).
Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.
Dataset Structure
The raw data is a single CSV with columns:
actionnr (integer transaction ID)
merchant_id (string)
average_amount_transaction_day (float)
transaction_amount (float)
is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)
total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)
Naming Conventions
All columns use lowercase snake_case.
Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.
Files in the code repo follow a clear structure:
├── data/ # local copies only; raw data lives in DBRepo
├── notebooks/Task.ipynb
├── models/rf_model_v1.joblib
├── outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv
├── README.md
├── requirements.txt
└── codemeta.json
Required Software
Python 3.9+
pandas, numpy (data handling)
scikit-learn (modeling, metrics)
matplotlib (visualizations)
dbrepo‐client.py (DBRepo API)
requests (TU WRD API)
Additional Resources
Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud
Scikit-learn docs: https://scikit-learn.org/stable
DBRepo API guide: via the starter notebook’s dbrepo_client.py template
TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs
Data Limitations
Highly imbalanced: only ~0.17% of transactions are fraudulent.
Anonymized PCA features (V1–V28) hidden; we extended with domain features but cannot reverse engineer raw variables.
Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.
Licensing and Attribution
Raw data: CC-0 (per Kaggle terms)
Code & notebooks: MIT License
Model artifacts & outputs: CC-BY 4.0
DUWRD records include ORCID identifiers for the author.
Recommended Uses
Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.
Educational purposes: demonstrating model‐training pipelines, FAIR data practices.
Extension: adding time‐series or deep‐learning models.
Known Issues
Possible temporal leakage if date/time features not handled correctly.
Model performance may degrade on live data due to concept drift.
Binary flags may oversimplify nuanced transaction outcomes.