3 datasets found
  1. Default of Credit Card Clients Dataset

    • kaggle.com
    Updated Nov 3, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning (2016). Default of Credit Card Clients Dataset [Dataset]. https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 3, 2016
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    UCI Machine Learning
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Information

    This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

    Content

    There are 25 variables:

    • ID: ID of each client
    • LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit
    • SEX: Gender (1=male, 2=female)
    • EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)
    • MARRIAGE: Marital status (1=married, 2=single, 3=others)
    • AGE: Age in years
    • PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)
    • PAY_2: Repayment status in August, 2005 (scale same as above)
    • PAY_3: Repayment status in July, 2005 (scale same as above)
    • PAY_4: Repayment status in June, 2005 (scale same as above)
    • PAY_5: Repayment status in May, 2005 (scale same as above)
    • PAY_6: Repayment status in April, 2005 (scale same as above)
    • BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)
    • BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)
    • BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)
    • BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)
    • BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)
    • BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)
    • PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)
    • PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)
    • PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)
    • PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)
    • PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)
    • PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)
    • default.payment.next.month: Default payment (1=yes, 0=no)

    Inspiration

    Some ideas for exploration:

    1. How does the probability of default payment vary by categories of different demographic variables?
    2. Which variables are the strongest predictors of default payment?

    Acknowledgements

    Any publications based on this dataset should acknowledge the following:

    Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

    The original dataset can be found here at the UCI Machine Learning Repository.

  2. Credit Card Fraud Dataset

    • kaggle.com
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishal Painjane (2025). Credit Card Fraud Dataset [Dataset]. https://www.kaggle.com/datasets/vishalpainjane/dataset101
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vishal Painjane
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Credit risk assessment remains a critical function within financial services, influencing lending decisions, portfolio risk management, and regulatory compliance. It integrates multiple categories of financial, transactional, and behavioral data to enable advanced machine learning applications in the domain of financial risk modeling.

    Data Composition and Structure

    The dataset comprises a total of 1,212 distinct features, systematically grouped into four principal categories, alongside a binary target variable. Each feature category represents a specific dimension of credit risk assessment, reflecting both internal transactional data and externally sourced credit bureau information.

    Target Variable

    The dependent variable, denoted as bad_flag, represents a binary risk classification outcome associated with each customer account. The variable takes the following values:

    • 0: Denotes a low-risk, creditworthy customer
    • 1: Denotes a high-risk, default-prone customer

    This variable serves as the target for binary classification models aimed at predicting credit risk propensity.

    Feature Groups

    CategoryNumber of FeaturesDescription
    Transaction Attributes664Customer-level transaction behavior, repayment patterns, financial habits
    Bureau Credit Data452Credit scores, external bureau records, delinquency flags, historical credit data
    Bureau Enquiries50Credit inquiry history, frequency and type of external credit applications
    ONUS Attributes48Internal bank relationship metrics, account engagement indicators

    Each feature within a category follows a systematic sequential naming convention (e.g., transaction_attribute_1, bureau_1), facilitating programmatic identification and group-level analysis.

    Data Characteristics

    The dataset exhibits several characteristics that mirror operational credit risk data environments:

    • High Dimensionality: The feature space exceeds 1,200 variables
    • Mixed Data Types: Numerical values (continuous and discrete), binary indicators
    • High Sparsity: A substantial proportion of features contain zero values or missing entries
    • Value Range Disparity: Feature values exhibit significant variance, with magnitudes ranging from small ratios (0.001) to large transaction amounts (288,500)

    Methodological Rationale

    The dataset was constructed by simulating data generation processes typical within financial services institutions. Transactional behaviors, bureau records, and inquiry histories were aggregated and engineered into derivative features.

  3. UK spending on credit and debit cards

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated May 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2024). UK spending on credit and debit cards [Dataset]. https://www.ons.gov.uk/economy/economicoutputandproductivity/output/datasets/ukspendingoncreditanddebitcards
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 16, 2024
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    Daily, weekly and monthly data showing seasonally adjusted and non-seasonally adjusted UK spending using debit and credit cards. These are official statistics in development. Source: CHAPS, Bank of England.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
UCI Machine Learning (2016). Default of Credit Card Clients Dataset [Dataset]. https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset/discussion
Organization logo

Default of Credit Card Clients Dataset

Default Payments of Credit Card Clients in Taiwan from 2005

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 3, 2016
Dataset provided by
Kagglehttp://kaggle.com/
Authors
UCI Machine Learning
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset Information

This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

Content

There are 25 variables:

  • ID: ID of each client
  • LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit
  • SEX: Gender (1=male, 2=female)
  • EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)
  • MARRIAGE: Marital status (1=married, 2=single, 3=others)
  • AGE: Age in years
  • PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)
  • PAY_2: Repayment status in August, 2005 (scale same as above)
  • PAY_3: Repayment status in July, 2005 (scale same as above)
  • PAY_4: Repayment status in June, 2005 (scale same as above)
  • PAY_5: Repayment status in May, 2005 (scale same as above)
  • PAY_6: Repayment status in April, 2005 (scale same as above)
  • BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)
  • BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)
  • BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)
  • BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)
  • BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)
  • BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)
  • PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)
  • PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)
  • PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)
  • PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)
  • PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)
  • PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)
  • default.payment.next.month: Default payment (1=yes, 0=no)

Inspiration

Some ideas for exploration:

  1. How does the probability of default payment vary by categories of different demographic variables?
  2. Which variables are the strongest predictors of default payment?

Acknowledgements

Any publications based on this dataset should acknowledge the following:

Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

The original dataset can be found here at the UCI Machine Learning Repository.

Search
Clear search
Close search
Google apps
Main menu