https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.
The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on http://mlg.ulb.ac.be/BruFence and http://mlg.ulb.ac.be/ARTML.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:
Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.
Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.
Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.
Method of Dataset Preparation
Schema validation: Renamed columns to snake_case (e.g. transaction_amount
, is_declined
) so they conform to DBRepo’s requirements.
Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).
Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr
. Each subset was materialized in DBRepo and assigned its own PID for precise citation.
Cleaning: Converted the categorical flags (is_declined
, isforeigntransaction
, ishighriskcountry
, isfradulent
) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr
, merchant_id
).
Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.
Dataset Structure
The raw data is a single CSV with columns:
actionnr
(integer transaction ID)
merchant_id
(string)
average_amount_transaction_day
(float)
transaction_amount
(float)
is_declined
, isforeigntransaction
, ishighriskcountry
, isfradulent
(binary flags)
total_number_of_declines_day
, daily_chargeback_avg_amt
, sixmonth_avg_chbk_amt
, sixmonth_chbk_freq
(numeric features)
Naming Conventions
All columns use lowercase snake_case.
Subsets are named creditcard_training
, creditcard_validation
, creditcard_test
in DBRepo.
Files in the code repo follow a clear structure:
├── data/ # local copies only; raw data lives in DBRepo
├── notebooks/Task.ipynb
├── models/rf_model_v1.joblib
├── outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv
├── README.md
├── requirements.txt
└── codemeta.json
Required Software
Python 3.9+
pandas, numpy (data handling)
scikit-learn (modeling, metrics)
matplotlib (visualizations)
dbrepo‐client.py (DBRepo API)
requests (TU WRD API)
Additional Resources
Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud
Scikit-learn docs: https://scikit-learn.org/stable
DBRepo API guide: via the starter notebook’s dbrepo_client.py
template
TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs
Data Limitations
Highly imbalanced: only ~0.17% of transactions are fraudulent.
Anonymized PCA features (V1
–V28
) hidden; we extended with domain features but cannot reverse engineer raw variables.
Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.
Licensing and Attribution
Raw data: CC-0 (per Kaggle terms)
Code & notebooks: MIT License
Model artifacts & outputs: CC-BY 4.0
DUWRD records include ORCID identifiers for the author.
Recommended Uses
Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.
Educational purposes: demonstrating model‐training pipelines, FAIR data practices.
Extension: adding time‐series or deep‐learning models.
Known Issues
Possible temporal leakage if date/time features not handled correctly.
Model performance may degrade on live data due to concept drift.
Binary flags may oversimplify nuanced transaction outcomes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit card fraud can lead to significant financial losses for both individuals and financial institutions. In this paper, we propose a novel method called CTCN, which uses Conditional Tabular Generative Adversarial Networks (CTGAN) and Temporal Convolutional Network (TCN) for credit card fraud detection. Our approach includes an oversampling algorithm that uses CTGAN to balance the dataset, and Neighborhood Cleaning Rule (NCL) to filter out majority class samples that overlap with the minority class. We generate synthetic minority class samples that conform to the original data distribution, resulting in a balanced dataset. We then employ TCN to analyze transaction sequences and capture long-term dependencies between data, revealing potential relationships between transaction sequences, thus achieving accurate credit card fraud detection. Experiments on three public datasets demonstrate that our proposed method outperforms current machine learning and deep learning methods, as measured by recall, F1-Score, and AUC-ROC.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit card fraud identification is an important issue in risk prevention and control for banks and financial institutions. In order to establish an efficient credit card fraud identification model, this article studied the relevant factors that affect fraud identification. A credit card fraud identification model based on neural networks was constructed, and in-depth discussions and research were conducted. First, the layers of neural networks were deepened to improve the prediction accuracy of the model; second, this paper increase the hidden layer width of the neural network to improve the prediction accuracy of the model. This article proposes a new fusion neural network model by combining deep neural networks and wide neural networks, and applies the model to credit card fraud identification. The characteristic of this model is that the accuracy of prediction and F1 score are relatively high. Finally, use the random gradient descent method to train the model. On the test set, the proposed method has an accuracy of 96.44% and an F1 value of 96.17%, demonstrating good fraud recognition performance. After comparison, the method proposed in this paper is superior to machine learning models, ensemble learning models, and deep learning models.
Fraud Detection And Prevention Market Size 2025-2029
The fraud detection and prevention market size is forecast to increase by USD 122.65 billion, at a CAGR of 30.1% between 2024 and 2029.
The market is witnessing significant growth, driven by the increasing adoption of cloud-based services. Businesses are recognizing the benefits of cloud solutions, such as real-time fraud detection, scalability, and cost savings. Additionally, technological advancements in fraud detection and prevention solutions and services are enabling organizations to better protect their assets from sophisticated fraud schemes. However, the complex IT infrastructure of modern businesses poses a challenge in implementing and integrating these solutions effectively. The complexity of the IT infrastructure, which integrates cloud computing, big data, and mobile devices, creates a vast network of devices with insufficient security features.
To capitalize on market opportunities, companies must stay abreast of these trends and invest in advanced fraud detection technologies. Effective implementation and integration of these solutions, coupled with continuous innovation, will be crucial for businesses seeking to mitigate fraud risks and protect their reputation and financial stability. Furthermore, the constant evolution of fraud techniques necessitates continuous innovation and adaptation from solution providers. Encryption techniques and network security protocols form the foundation of robust cybersecurity defenses, while compliance regulations and penetration testing help identify vulnerabilities and strengthen security posture.
What will be the Size of the Fraud Detection And Prevention Market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample
The market continues to evolve, driven by the constant emergence of new threats and the need for advanced technologies to mitigate risks across various sectors. Real-time fraud alerts, anomaly detection systems, forensic accounting tools, and risk mitigation strategies are integrated into comprehensive solutions that adapt to the ever-changing fraud landscape. Entities rely on these tools to maintain regulatory compliance frameworks and incident response planning, ensuring access control management and vulnerability assessments are up-to-date. Machine learning algorithms and transaction monitoring tools enable the detection of suspicious activity, providing valuable insights into potential threats.
Intrusion detection systems and behavioral biometrics offer real-time protection against cyberattacks and payment fraud, while identity verification methods and risk scoring models help prevent account takeover and data loss. Cybersecurity threat intelligence and authentication protocols enhance the overall security strategy, providing a layered approach to fraud prevention. Fraud investigation techniques and loss prevention metrics enable entities to respond effectively to incidents and minimize the impact of data breaches. Social engineering countermeasures and payment fraud detection solutions further fortify the fraud prevention arsenal, ensuring continuous protection against evolving threats.
The ongoing dynamism of the market demands a proactive approach, with entities staying informed and agile to maintain a strong defense against fraudulent activities.
How is this Fraud Detection And Prevention Industry segmented?
The fraud detection and prevention industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
Component
Solutions
Services
End-user
Large enterprise
SMEs
Application
Transaction monitoring
Compliance and risk management
Identity verification
Behavioral analytics
Others
Geography
North America
US
Canada
Europe
France
Germany
Italy
Russia
UK
APAC
China
India
Japan
Rest of World (ROW)
By Component Insights
The Solutions segment is estimated to witness significant growth during the forecast period. The market is experiencing significant growth due to escalating cyber threats, increasing regulatory compliance requirements, and the need to mitigate financial losses. Biometric authentication, encryption techniques, machine learning algorithms, and intrusion detection systems are among the key solutions driving market expansion. Regulatory frameworks, such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA), are mandating robust incident response planning, access control management, and data breach prevention strategies. Vulnerability as
Debt Collection Software Market Size 2024-2028
The debt collection software market size is forecast to increase by USD 2.31 billion at a CAGR of 8.92% between 2023 and 2028.
The market is experiencing significant growth due to the increasing prevalence of non-performing loans (NPLs) worldwide. According to recent reports, the global NPL ratio reached an all-time high of 5.3% in 2020, creating a pressing need for efficient debt collection solutions. In response, market participants are integrating advanced technologies such as artificial intelligence, machine learning, and predictive analytics into their software offerings to streamline the collection process and improve recovery rates. However, the high cost of debt collection software remains a significant challenge for small and medium-sized enterprises (SMEs) and startups. The upfront investment required for implementing these solutions can be prohibitive, limiting their adoption.
Furthermore, the complexity of the software and the need for specialized expertise to operate it effectively can add to the overall cost and implementation time. To capitalize on the market opportunities presented by the growing NPL problem and the integration of advanced technologies, companies must focus on offering affordable, user-friendly solutions that cater to the unique needs of SMEs and startups. By doing so, they can differentiate themselves from competitors and gain a competitive edge in the market.
What will be the Size of the Debt Collection Software Market during the forecast period?
Request Free Sample
The market continues to evolve, with customer service and collection process automation playing pivotal roles in enhancing efficiency and effectiveness. Debt recovery, reporting and analytics, cloud computing, data security, and regulatory compliance are integral components, ensuring seamless integration and optimization. Machine learning and collection workflows facilitate advanced fraud detection, while collection tactics adapt to consumer debt scenarios. Collection agencies leverage technology for compliance management and collection strategies, encompassing financial services, business debt, and commercial debt.
Predictive analytics and debt portfolio management enable proactive debt collection and risk management. Virtual collections, invoice financing, and account recovery solutions further expand the market's reach, with remote collections, artificial intelligence, and legal compliance shaping the future landscape.
How is this Debt Collection Software Industry segmented?
The debt collection software industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
Deployment
On-premises
Cloud-based
Industry Application
Banking and Financial Services
Healthcare
Retail
Telecom
Government
Others
Software Component
Software
Service
Geography
North America
US
Canada
Europe
France
Germany
Italy
UK
Middle East and Africa
Egypt
KSA
Oman
UAE
APAC
China
India
Japan
South America
Argentina
Brazil
Rest of World (ROW)
By Deployment Insights
The on-premises segment is estimated to witness significant growth during the forecast period.
In the debt collection industry, on-premises debt collection software solutions hold a prominent position in the global market. These solutions cater to organizations that value internal control, data security, and customization. Deployed directly within an organization, they offer users extensive autonomy over their debt collection processes. Compliance with stringent data privacy regulations is a major concern for industries such as finance and healthcare, making on-premises software a preferred choice. Companies like DAKCS Software Systems Inc. Implement these solutions to manage delinquent accounts, credit card debt, and business debt. Collection process automation, reporting and analytics, and customer relationship management are integral features.
Collection tactics, regulatory compliance, and compliance management are also crucial elements. Machine learning and predictive analytics enable advanced debt portfolio management and collection strategies. Collection call automation, skip tracing, and fraud detection further enhance efficiency. Virtual collections, invoice financing, and account recovery are additional functionalities. Artificial intelligence and legal compliance ensure effective risk management and collections management. Collection automation, debt collection laws, and debt collection regulations are addressed. Medical debt, consumer debt, and student loan debt are effectively managed. Virtual assistant technology offers assistance in d
Automatic Fare Collection Systems Market Size 2024-2028
The automatic fare collection (AFC) systems market size is forecast to increase by USD 5.85 billion at a CAGR of 10.87% between 2023 and 2028. The market is experiencing significant growth due to the successful implementation of various technologies in transportation projects. With a rising focus on cashless transactions, there is an increasing demand for user-friendly AFC systems that offer advanced fraud detection techniques and ensure data privacy and security. Big data analytics, artificial intelligence (AI), machine learning, Internet of Things (IoT), cloud computing, voice recognition, and biometric identification are some of the key technologies driving innovation in the AFC industry. These technologies enable real-time data processing, improved customer experience, and enhanced security features. As transportation infrastructure continues to evolve, AFC systems will play a crucial role in facilitating seamless and efficient travel experiences for users.
Request Free Sample
The market is witnessing significant growth due to the increasing adoption of contactless technology, smart cards, and QR code-based ticketing in the transportation sector. These advanced technologies offer numerous benefits, including operational efficiency, ease of payment, and reduced operational expenditures. AFC systems provide an end-to-end solution for transportation providers, enabling seamless fare collection through various modes such as pre-paid/credit, debit cards, and UPI payment mode. Contactless technology, which includes NFC and RFID, plays a crucial role in enabling quick and secure transactions.
Moreover, smart cards and magnetic stripe cards are popular AFC solutions, offering contactless and contact-based payment options, respectively. QR code-based ticketing is another innovative solution that allows passengers to scan a code using their smartphones to purchase and validate tickets. Vending machines and kiosks equipped with AFC systems offer additional convenience for passengers, enabling them to purchase tickets and recharge their smart cards without the need for manpower. These systems also provide real-time datasets and records, enabling transportation providers to monitor traffic flow, optimize infrastructure repair, and prevent fraud. The growth of AFC systems in the transportation sector is driven by several factors, including the increasing penetration of smartphones and the shift towards cashless payment methods.
Similarly, smart cities and municipal operations are also adopting AFC systems to enhance operational efficiency and improve the overall transportation experience. Despite the numerous benefits, there are challenges associated with AFC systems, including security concerns and the need for continuous maintenance and updates. However, the advantages of AFC systems far outweigh the challenges, making them an essential component of modern transportation infrastructure. In conclusion, the AFC systems market is poised for continued growth as transportation providers seek to enhance operational efficiency, reduce operational expenditures, and offer passengers a convenient and secure payment experience. The adoption of contactless technology, smart cards, and QR code-based ticketing is expected to drive the growth of the market, with vending machines and kiosks offering additional convenience for passengers. The shift towards digital tickets and the increasing penetration of smartphones and cashless payment methods are also key factors contributing to the growth of the AFC systems market.
Market Segmentation
The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
Component
Hardware
Software
Application
Railways
Parking
Entertainment
Others
Geography
North America
US
Europe
Germany
UK
APAC
China
Japan
South America
Middle East and Africa
By Component Insights
The hardware segment is estimated to witness significant growth during the forecast period. Automatic Fare Collection (AFC) systems have become an integral part of modern transportation infrastructure, enabling contactless and efficient fare collection. These systems utilize various hardware components to ensure seamless and accurate processing of passenger fares. Facial recognition and fingerprinting technologies are increasingly being integrated into AFC gates for identity verification, adding an extra layer of security. AFC gates serve as crucial entry and exit points in transportation hubs, including airports, metro stations, and bus terminals. These gates employ sensors to detect passenger movement and allow access only after valid fare payment. Ticket v
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.
The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on http://mlg.ulb.ac.be/BruFence and http://mlg.ulb.ac.be/ARTML.