7 datasets found
  1. Credit card fraud detection Date 25th of June 2015

    • kaggle.com
    Updated Oct 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zohair ahmed (2023). Credit card fraud detection Date 25th of June 2015 [Dataset]. https://www.kaggle.com/datasets/qnqfbqfqo/credit-card-fraud-detection-date-25th-of-june-2015
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 29, 2023
    Dataset provided by
    Kaggle
    Authors
    Zohair ahmed
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

    It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

    The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on http://mlg.ulb.ac.be/BruFence and http://mlg.ulb.ac.be/ARTML.

  2. t

    Credit Card Fraud Detection

    • test.researchdata.tuwien.ac.at
    • zenodo.org
    • +1more
    csv, json, pdf +2
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22
    Explore at:
    text/markdown, csv, pdf, txt, jsonAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    TU Wien
    Authors
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 28, 2025
    Description

    Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

    1. Dataset Description

    Research Domain
    This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

    Purpose
    The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

    Data Sources
    We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

    Method of Dataset Preparation

    1. Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

    2. Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

    3. Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

    4. Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

    5. Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

    2. Technical Details

    Dataset Structure

    • The raw data is a single CSV with columns:

      • actionnr (integer transaction ID)

      • merchant_id (string)

      • average_amount_transaction_day (float)

      • transaction_amount (float)

      • is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

      • total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

    Naming Conventions

    • All columns use lowercase snake_case.

    • Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

    • Files in the code repo follow a clear structure:

      ├── data/         # local copies only; raw data lives in DBRepo 
      ├── notebooks/Task.ipynb 
      ├── models/rf_model_v1.joblib 
      ├── outputs/        # confusion_matrix.png, roc_curve.png, predictions.csv 
      ├── README.md 
      ├── requirements.txt 
      └── codemeta.json 
      

    Required Software

    • Python 3.9+

    • pandas, numpy (data handling)

    • scikit-learn (modeling, metrics)

    • matplotlib (visualizations)

    • dbrepo‐client.py (DBRepo API)

    • requests (TU WRD API)

    Additional Resources

    3. Further Details

    Data Limitations

    • Highly imbalanced: only ~0.17% of transactions are fraudulent.

    • Anonymized PCA features (V1V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

    • Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

    Licensing and Attribution

    • Raw data: CC-0 (per Kaggle terms)

    • Code & notebooks: MIT License

    • Model artifacts & outputs: CC-BY 4.0

    • DUWRD records include ORCID identifiers for the author.

    Recommended Uses

    • Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

    • Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

    • Extension: adding time‐series or deep‐learning models.

    Known Issues

    • Possible temporal leakage if date/time features not handled correctly.

    • Model performance may degrade on live data due to concept drift.

    • Binary flags may oversimplify nuanced transaction outcomes.

  3. A Novel Credit Card Fraud Detection Method

    • zenodo.org
    bin
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaoyan Zhao; Xiaoyan Zhao (2023). A Novel Credit Card Fraud Detection Method [Dataset]. http://doi.org/10.5281/zenodo.8159789
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Xiaoyan Zhao; Xiaoyan Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Credit card fraud can lead to significant financial losses for both individuals and financial institutions. In this paper, we propose a novel method called CTCN, which uses Conditional Tabular Generative Adversarial Networks (CTGAN) and Temporal Convolutional Network (TCN) for credit card fraud detection. Our approach includes an oversampling algorithm that uses CTGAN to balance the dataset, and Neighborhood Cleaning Rule (NCL) to filter out majority class samples that overlap with the minority class. We generate synthetic minority class samples that conform to the original data distribution, resulting in a balanced dataset. We then employ TCN to analyze transaction sequences and capture long-term dependencies between data, revealing potential relationships between transaction sequences, thus achieving accurate credit card fraud detection. Experiments on three public datasets demonstrate that our proposed method outperforms current machine learning and deep learning methods, as measured by recall, F1-Score, and AUC-ROC.

  4. f

    Example of the data set used in this article.

    • plos.figshare.com
    xls
    Updated Oct 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shan Jiang; Xiaofeng Liao; Yuming Feng; Zilin Gao; Babatunde Oluwaseun Onasanya (2024). Example of the data set used in this article. [Dataset]. http://doi.org/10.1371/journal.pone.0311987.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 28, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Shan Jiang; Xiaofeng Liao; Yuming Feng; Zilin Gao; Babatunde Oluwaseun Onasanya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Credit card fraud identification is an important issue in risk prevention and control for banks and financial institutions. In order to establish an efficient credit card fraud identification model, this article studied the relevant factors that affect fraud identification. A credit card fraud identification model based on neural networks was constructed, and in-depth discussions and research were conducted. First, the layers of neural networks were deepened to improve the prediction accuracy of the model; second, this paper increase the hidden layer width of the neural network to improve the prediction accuracy of the model. This article proposes a new fusion neural network model by combining deep neural networks and wide neural networks, and applies the model to credit card fraud identification. The characteristic of this model is that the accuracy of prediction and F1 score are relatively high. Finally, use the random gradient descent method to train the model. On the test set, the proposed method has an accuracy of 96.44% and an F1 value of 96.17%, demonstrating good fraud recognition performance. After comparison, the method proposed in this paper is superior to machine learning models, ensemble learning models, and deep learning models.

  5. Fraud Detection And Prevention Market Analysis, Size, and Forecast...

    • technavio.com
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Fraud Detection And Prevention Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, Italy, Russia, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/fraud-detection-and-prevention-market-analysis
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United States, Global
    Description

    Snapshot img

    Fraud Detection And Prevention Market Size 2025-2029

    The fraud detection and prevention market size is forecast to increase by USD 122.65 billion, at a CAGR of 30.1% between 2024 and 2029.

    The market is witnessing significant growth, driven by the increasing adoption of cloud-based services. Businesses are recognizing the benefits of cloud solutions, such as real-time fraud detection, scalability, and cost savings. Additionally, technological advancements in fraud detection and prevention solutions and services are enabling organizations to better protect their assets from sophisticated fraud schemes. However, the complex IT infrastructure of modern businesses poses a challenge in implementing and integrating these solutions effectively. The complexity of the IT infrastructure, which integrates cloud computing, big data, and mobile devices, creates a vast network of devices with insufficient security features.
    To capitalize on market opportunities, companies must stay abreast of these trends and invest in advanced fraud detection technologies. Effective implementation and integration of these solutions, coupled with continuous innovation, will be crucial for businesses seeking to mitigate fraud risks and protect their reputation and financial stability. Furthermore, the constant evolution of fraud techniques necessitates continuous innovation and adaptation from solution providers. Encryption techniques and network security protocols form the foundation of robust cybersecurity defenses, while compliance regulations and penetration testing help identify vulnerabilities and strengthen security posture.
    

    What will be the Size of the Fraud Detection And Prevention Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free Sample

    The market continues to evolve, driven by the constant emergence of new threats and the need for advanced technologies to mitigate risks across various sectors. Real-time fraud alerts, anomaly detection systems, forensic accounting tools, and risk mitigation strategies are integrated into comprehensive solutions that adapt to the ever-changing fraud landscape. Entities rely on these tools to maintain regulatory compliance frameworks and incident response planning, ensuring access control management and vulnerability assessments are up-to-date. Machine learning algorithms and transaction monitoring tools enable the detection of suspicious activity, providing valuable insights into potential threats.

    Intrusion detection systems and behavioral biometrics offer real-time protection against cyberattacks and payment fraud, while identity verification methods and risk scoring models help prevent account takeover and data loss. Cybersecurity threat intelligence and authentication protocols enhance the overall security strategy, providing a layered approach to fraud prevention. Fraud investigation techniques and loss prevention metrics enable entities to respond effectively to incidents and minimize the impact of data breaches. Social engineering countermeasures and payment fraud detection solutions further fortify the fraud prevention arsenal, ensuring continuous protection against evolving threats.

    The ongoing dynamism of the market demands a proactive approach, with entities staying informed and agile to maintain a strong defense against fraudulent activities.

    How is this Fraud Detection And Prevention Industry segmented?

    The fraud detection and prevention industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Component
    
      Solutions
      Services
    
    
    End-user
    
      Large enterprise
      SMEs
    
    
    Application
    
      Transaction monitoring
      Compliance and risk management
      Identity verification
      Behavioral analytics
      Others
    
    
    Geography
    
      North America
    
        US
        Canada
    
    
      Europe
    
        France
        Germany
        Italy
        Russia
        UK
    
    
      APAC
    
        China
        India
        Japan
    
    
      Rest of World (ROW)
    

    By Component Insights

    The Solutions segment is estimated to witness significant growth during the forecast period. The market is experiencing significant growth due to escalating cyber threats, increasing regulatory compliance requirements, and the need to mitigate financial losses. Biometric authentication, encryption techniques, machine learning algorithms, and intrusion detection systems are among the key solutions driving market expansion. Regulatory frameworks, such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA), are mandating robust incident response planning, access control management, and data breach prevention strategies. Vulnerability as

  6. Debt Collection Software Market Analysis, Size, and Forecast 2024-2028:...

    • technavio.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio, Debt Collection Software Market Analysis, Size, and Forecast 2024-2028: North America (US and Canada), Europe (France, Germany, Italy, and UK), Middle East and Africa (Egypt, KSA, Oman, and UAE), APAC (China, India, and Japan), South America (Argentina and Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/debt-collection-software-market-industry-analysis
    Explore at:
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Canada, Saudi Arabia, Germany, United States, Global
    Description

    Snapshot img

    Debt Collection Software Market Size 2024-2028

    The debt collection software market size is forecast to increase by USD 2.31 billion at a CAGR of 8.92% between 2023 and 2028.

    The market is experiencing significant growth due to the increasing prevalence of non-performing loans (NPLs) worldwide. According to recent reports, the global NPL ratio reached an all-time high of 5.3% in 2020, creating a pressing need for efficient debt collection solutions. In response, market participants are integrating advanced technologies such as artificial intelligence, machine learning, and predictive analytics into their software offerings to streamline the collection process and improve recovery rates. However, the high cost of debt collection software remains a significant challenge for small and medium-sized enterprises (SMEs) and startups. The upfront investment required for implementing these solutions can be prohibitive, limiting their adoption.
    Furthermore, the complexity of the software and the need for specialized expertise to operate it effectively can add to the overall cost and implementation time. To capitalize on the market opportunities presented by the growing NPL problem and the integration of advanced technologies, companies must focus on offering affordable, user-friendly solutions that cater to the unique needs of SMEs and startups. By doing so, they can differentiate themselves from competitors and gain a competitive edge in the market.
    

    What will be the Size of the Debt Collection Software Market during the forecast period?

    Request Free Sample

    The market continues to evolve, with customer service and collection process automation playing pivotal roles in enhancing efficiency and effectiveness. Debt recovery, reporting and analytics, cloud computing, data security, and regulatory compliance are integral components, ensuring seamless integration and optimization. Machine learning and collection workflows facilitate advanced fraud detection, while collection tactics adapt to consumer debt scenarios. Collection agencies leverage technology for compliance management and collection strategies, encompassing financial services, business debt, and commercial debt.
    Predictive analytics and debt portfolio management enable proactive debt collection and risk management. Virtual collections, invoice financing, and account recovery solutions further expand the market's reach, with remote collections, artificial intelligence, and legal compliance shaping the future landscape.
    

    How is this Debt Collection Software Industry segmented?

    The debt collection software industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Deployment
    
      On-premises
      Cloud-based
    
    
    Industry Application
    
      Banking and Financial Services
      Healthcare
      Retail
      Telecom
      Government
      Others
    
    
    Software Component
    
      Software
      Service
    
    
    Geography
    
      North America
    
        US
        Canada
    
    
      Europe
    
        France
        Germany
        Italy
        UK
    
    
      Middle East and Africa
    
        Egypt
        KSA
        Oman
        UAE
    
    
      APAC
    
        China
        India
        Japan
    
    
      South America
    
        Argentina
        Brazil
    
    
      Rest of World (ROW)
    

    By Deployment Insights

    The on-premises segment is estimated to witness significant growth during the forecast period.

    In the debt collection industry, on-premises debt collection software solutions hold a prominent position in the global market. These solutions cater to organizations that value internal control, data security, and customization. Deployed directly within an organization, they offer users extensive autonomy over their debt collection processes. Compliance with stringent data privacy regulations is a major concern for industries such as finance and healthcare, making on-premises software a preferred choice. Companies like DAKCS Software Systems Inc. Implement these solutions to manage delinquent accounts, credit card debt, and business debt. Collection process automation, reporting and analytics, and customer relationship management are integral features.

    Collection tactics, regulatory compliance, and compliance management are also crucial elements. Machine learning and predictive analytics enable advanced debt portfolio management and collection strategies. Collection call automation, skip tracing, and fraud detection further enhance efficiency. Virtual collections, invoice financing, and account recovery are additional functionalities. Artificial intelligence and legal compliance ensure effective risk management and collections management. Collection automation, debt collection laws, and debt collection regulations are addressed. Medical debt, consumer debt, and student loan debt are effectively managed. Virtual assistant technology offers assistance in d

  7. Automatic Fare Collection (AFC) Systems Market Analysis North America,...

    • technavio.com
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Automatic Fare Collection (AFC) Systems Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, Germany, China, UK, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/afc-systems-market-analysis
    Explore at:
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Germany, Japan, United States, Global
    Description

    Snapshot img

    Automatic Fare Collection Systems Market Size 2024-2028

    The automatic fare collection (AFC) systems market size is forecast to increase by USD 5.85 billion at a CAGR of 10.87% between 2023 and 2028. The market is experiencing significant growth due to the successful implementation of various technologies in transportation projects. With a rising focus on cashless transactions, there is an increasing demand for user-friendly AFC systems that offer advanced fraud detection techniques and ensure data privacy and security. Big data analytics, artificial intelligence (AI), machine learning, Internet of Things (IoT), cloud computing, voice recognition, and biometric identification are some of the key technologies driving innovation in the AFC industry. These technologies enable real-time data processing, improved customer experience, and enhanced security features. As transportation infrastructure continues to evolve, AFC systems will play a crucial role in facilitating seamless and efficient travel experiences for users.

    Request Free Sample

    The market is witnessing significant growth due to the increasing adoption of contactless technology, smart cards, and QR code-based ticketing in the transportation sector. These advanced technologies offer numerous benefits, including operational efficiency, ease of payment, and reduced operational expenditures. AFC systems provide an end-to-end solution for transportation providers, enabling seamless fare collection through various modes such as pre-paid/credit, debit cards, and UPI payment mode. Contactless technology, which includes NFC and RFID, plays a crucial role in enabling quick and secure transactions.

    Moreover, smart cards and magnetic stripe cards are popular AFC solutions, offering contactless and contact-based payment options, respectively. QR code-based ticketing is another innovative solution that allows passengers to scan a code using their smartphones to purchase and validate tickets. Vending machines and kiosks equipped with AFC systems offer additional convenience for passengers, enabling them to purchase tickets and recharge their smart cards without the need for manpower. These systems also provide real-time datasets and records, enabling transportation providers to monitor traffic flow, optimize infrastructure repair, and prevent fraud. The growth of AFC systems in the transportation sector is driven by several factors, including the increasing penetration of smartphones and the shift towards cashless payment methods.

    Similarly, smart cities and municipal operations are also adopting AFC systems to enhance operational efficiency and improve the overall transportation experience. Despite the numerous benefits, there are challenges associated with AFC systems, including security concerns and the need for continuous maintenance and updates. However, the advantages of AFC systems far outweigh the challenges, making them an essential component of modern transportation infrastructure. In conclusion, the AFC systems market is poised for continued growth as transportation providers seek to enhance operational efficiency, reduce operational expenditures, and offer passengers a convenient and secure payment experience. The adoption of contactless technology, smart cards, and QR code-based ticketing is expected to drive the growth of the market, with vending machines and kiosks offering additional convenience for passengers. The shift towards digital tickets and the increasing penetration of smartphones and cashless payment methods are also key factors contributing to the growth of the AFC systems market.

    Market Segmentation

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Component
    
      Hardware
      Software
    
    
    Application
    
      Railways
      Parking
      Entertainment
      Others
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        Japan
    
    
      South America
    
    
    
      Middle East and Africa
    

    By Component Insights

    The hardware segment is estimated to witness significant growth during the forecast period. Automatic Fare Collection (AFC) systems have become an integral part of modern transportation infrastructure, enabling contactless and efficient fare collection. These systems utilize various hardware components to ensure seamless and accurate processing of passenger fares. Facial recognition and fingerprinting technologies are increasingly being integrated into AFC gates for identity verification, adding an extra layer of security. AFC gates serve as crucial entry and exit points in transportation hubs, including airports, metro stations, and bus terminals. These gates employ sensors to detect passenger movement and allow access only after valid fare payment. Ticket v

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zohair ahmed (2023). Credit card fraud detection Date 25th of June 2015 [Dataset]. https://www.kaggle.com/datasets/qnqfbqfqo/credit-card-fraud-detection-date-25th-of-june-2015
Organization logo

Credit card fraud detection Date 25th of June 2015

Machine Learning Algorithms for Detection

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2023
Dataset provided by
Kaggle
Authors
Zohair ahmed
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on http://mlg.ulb.ac.be/BruFence and http://mlg.ulb.ac.be/ARTML.

Search
Clear search
Close search
Google apps
Main menu