3 datasets found

Default of Credit Card Clients Dataset
kaggle.com
Updated Nov 3, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning (2016). Default of Credit Card Clients Dataset [Dataset]. https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 3, 2016
Dataset provided by
Kagglehttp://kaggle.com/
Authors
UCI Machine Learning
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Information

This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

Content

There are 25 variables:

ID: ID of each client

LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit

SEX: Gender (1=male, 2=female)

EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)

MARRIAGE: Marital status (1=married, 2=single, 3=others)

AGE: Age in years

PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)

PAY_2: Repayment status in August, 2005 (scale same as above)

PAY_3: Repayment status in July, 2005 (scale same as above)

PAY_4: Repayment status in June, 2005 (scale same as above)

PAY_5: Repayment status in May, 2005 (scale same as above)

PAY_6: Repayment status in April, 2005 (scale same as above)

BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)

BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)

BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)

BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)

BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)

BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)

PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)

PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)

PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)

PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)

PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)

PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)

default.payment.next.month: Default payment (1=yes, 0=no)

Inspiration

Some ideas for exploration:

How does the probability of default payment vary by categories of different demographic variables?

Which variables are the strongest predictors of default payment?

Acknowledgements

Any publications based on this dataset should acknowledge the following:

Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

The original dataset can be found here at the UCI Machine Learning Repository.

Credit Card Fraud Dataset

kaggle.com

Updated Jan 28, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Vishal Painjane (2025). Credit Card Fraud Dataset [Dataset]. https://www.kaggle.com/datasets/vishalpainjane/dataset101

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 28, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Vishal Painjane

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Credit risk assessment remains a critical function within financial services, influencing lending decisions, portfolio risk management, and regulatory compliance. It integrates multiple categories of financial, transactional, and behavioral data to enable advanced machine learning applications in the domain of financial risk modeling.

Data Composition and Structure

The dataset comprises a total of 1,212 distinct features, systematically grouped into four principal categories, alongside a binary target variable. Each feature category represents a specific dimension of credit risk assessment, reflecting both internal transactional data and externally sourced credit bureau information.

Target Variable

The dependent variable, denoted as bad_flag, represents a binary risk classification outcome associated with each customer account. The variable takes the following values:

0: Denotes a low-risk, creditworthy customer
1: Denotes a high-risk, default-prone customer

This variable serves as the target for binary classification models aimed at predicting credit risk propensity.

Feature Groups

Category	Number of Features	Description
Transaction Attributes	664	Customer-level transaction behavior, repayment patterns, financial habits
Bureau Credit Data	452	Credit scores, external bureau records, delinquency flags, historical credit data
Bureau Enquiries	50	Credit inquiry history, frequency and type of external credit applications
ONUS Attributes	48	Internal bank relationship metrics, account engagement indicators

Each feature within a category follows a systematic sequential naming convention (e.g., transaction_attribute_1, bureau_1), facilitating programmatic identification and group-level analysis.

Data Characteristics

The dataset exhibits several characteristics that mirror operational credit risk data environments:

High Dimensionality: The feature space exceeds 1,200 variables
Mixed Data Types: Numerical values (continuous and discrete), binary indicators
High Sparsity: A substantial proportion of features contain zero values or missing entries
Value Range Disparity: Feature values exhibit significant variance, with magnitudes ranging from small ratios (0.001) to large transaction amounts (288,500)

Methodological Rationale

The dataset was constructed by simulating data generation processes typical within financial services institutions. Transactional behaviors, bureau records, and inquiry histories were aggregated and engineered into derivative features.

UK spending on credit and debit cards
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated May 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2024). UK spending on credit and debit cards [Dataset]. https://www.ons.gov.uk/economy/economicoutputandproductivity/output/datasets/ukspendingoncreditanddebitcards
Explore at:
xlsxAvailable download formats
Dataset updated
May 16, 2024
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
United Kingdom
Description
Daily, weekly and monthly data showing seasonally adjusted and non-seasonally adjusted UK spending using debit and credit cards. These are official statistics in development. Source: CHAPS, Bank of England.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

UCI Machine Learning (2016). Default of Credit Card Clients Dataset [Dataset]. https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset/discussion

Default of Credit Card Clients Dataset

Default Payments of Credit Card Clients in Taiwan from 2005

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 3, 2016

Dataset provided by

Kagglehttp://kaggle.com/

Authors

UCI Machine Learning

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset Information

This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

Content

There are 25 variables:

ID: ID of each client
LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit
SEX: Gender (1=male, 2=female)
EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)
MARRIAGE: Marital status (1=married, 2=single, 3=others)
AGE: Age in years
PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)
PAY_2: Repayment status in August, 2005 (scale same as above)
PAY_3: Repayment status in July, 2005 (scale same as above)
PAY_4: Repayment status in June, 2005 (scale same as above)
PAY_5: Repayment status in May, 2005 (scale same as above)
PAY_6: Repayment status in April, 2005 (scale same as above)
BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)
BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)
BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)
BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)
BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)
BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)
PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)
PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)
PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)
PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)
PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)
PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)
default.payment.next.month: Default payment (1=yes, 0=no)

Inspiration

Some ideas for exploration:

How does the probability of default payment vary by categories of different demographic variables?
Which variables are the strongest predictors of default payment?

Acknowledgements

Any publications based on this dataset should acknowledge the following:

Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

The original dataset can be found here at the UCI Machine Learning Repository.

Clear search

Close search

Google apps

Main menu

Default of Credit Card Clients Dataset

Dataset Information

Content

Inspiration

Acknowledgements

Credit Card Fraud Dataset

Data Composition and Structure

Target Variable

Feature Groups

Data Characteristics

Methodological Rationale

UK spending on credit and debit cards

Default of Credit Card Clients Dataset

Default Payments of Credit Card Clients in Taiwan from 2005

Dataset Information

Content

Inspiration

Acknowledgements