9 datasets found

Data from: Credit Card Default Dataset
kaggle.com
zip
Updated Apr 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ifeanyichukwu Nwobodo (2023). Credit Card Default Dataset [Dataset]. https://www.kaggle.com/datasets/ifeanyichukwunwobodo/credit-card-default
Explore at:
zip(1126400 bytes)Available download formats
Dataset updated
Apr 30, 2023
Authors
Ifeanyichukwu Nwobodo
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
The decision to give credit to a particular borrower is a very important decision for various financial institutions as this affects their revenue and profit. There is always a risk of default (not paying), this risk can be reduced by using data to identify the potential customers who will pay back and the ones who will default on their loan.

This dataset contains demographic and payment status data from a bank. The dataset can be used to practice and hone your exploratory data analysis and machine learning skills
Default on Their Credit Card
kaggle.com
zip
Updated Apr 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
reel jojo (2023). Default on Their Credit Card [Dataset]. https://www.kaggle.com/datasets/reeljojo/default-on-their-credit-card
Explore at:
zip(2513923 bytes)Available download formats
Dataset updated
Apr 9, 2023
Authors
reel jojo
Description
Abstract: This research aimed at the case of customersâ€™ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods.

Source: UCI Machine Learning Repository

Data Set Information: This research aimed at the case of customersâ€™ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel â€œSorting Smoothing Methodâ€ to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.

Attribute Information:

This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables: X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005. X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.
Data from: Credit Card Default
kaggle.com
zip
Updated Feb 17, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andy_M (2020). Credit Card Default [Dataset]. https://www.kaggle.com/datasets/arindam235/credit-card-default
Explore at:
zip(1484551 bytes)Available download formats
Dataset updated
Feb 17, 2020
Authors
Andy_M
Description
Context

This research aimed at the case of customersâ€™ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel â€œSorting Smoothing Methodâ€ to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.

Acknowledgements

Name: I-Cheng Yeh email addresses: (1) icyeh '@' chu.edu.tw (2) 140910 '@' mail.tku.edu.tw institutions: (1) Department of Information Management, Chung Hua University, Taiwan. (2) Department of Civil Engineering, Tamkang University, Taiwan. other contact information: 886-2-26215656 ext. 3181We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Default of Credit Card Clients Data Set
kaggle.com
zip
Updated Apr 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bojan Tunguz (2021). Default of Credit Card Clients Data Set [Dataset]. https://www.kaggle.com/tunguz/default-of-credit-card-clients-data-set
Explore at:
zip(1028993 bytes)Available download formats
Dataset updated
Apr 14, 2021
Authors
Bojan Tunguz
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Source:

Name: I-Cheng Yeh email addresses: (1) icyeh '@' chu.edu.tw (2) 140910 '@' mail.tku.edu.tw institutions: (1) Department of Information Management, Chung Hua University, Taiwan. (2) Department of Civil Engineering, Tamkang University, Taiwan. other contact information: 886-2-26215656 ext. 3181

Data Set Information:

This research aimed at the case of customersâ€™ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel â€œSorting Smoothing Methodâ€ to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.

Attribute Information:

This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables: X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005. X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.

Relevant Papers:

Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.

Citation Request:

Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.
Default of Credit Card Clients Dataset
kaggle.com
zip
Updated Nov 3, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning (2016). Default of Credit Card Clients Dataset [Dataset]. https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset/discussion
Explore at:
zip(1025318 bytes)Available download formats
Dataset updated
Nov 3, 2016
Dataset authored and provided by
UCI Machine Learning
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Information

This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

Content

There are 25 variables:

ID: ID of each client

LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit

SEX: Gender (1=male, 2=female)

EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)

MARRIAGE: Marital status (1=married, 2=single, 3=others)

AGE: Age in years

PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)

PAY_2: Repayment status in August, 2005 (scale same as above)

PAY_3: Repayment status in July, 2005 (scale same as above)

PAY_4: Repayment status in June, 2005 (scale same as above)

PAY_5: Repayment status in May, 2005 (scale same as above)

PAY_6: Repayment status in April, 2005 (scale same as above)

BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)

BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)

BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)

BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)

BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)

BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)

PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)

PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)

PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)

PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)

PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)

PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)

default.payment.next.month: Default payment (1=yes, 0=no)

Inspiration

Some ideas for exploration:

How does the probability of default payment vary by categories of different demographic variables?

Which variables are the strongest predictors of default payment?

Acknowledgements

Any publications based on this dataset should acknowledge the following:

Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

The original dataset can be found here at the UCI Machine Learning Repository.
Credit_Scoring_Data
kaggle.com
Updated Aug 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AdityaRaj Sharma (2023). Credit_Scoring_Data [Dataset]. https://www.kaggle.com/datasets/cs49adityarajsharma/credit-scoring-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 5, 2023
Dataset provided by
Kaggle
Authors
AdityaRaj Sharma
Description
Introduction:

This dataset analysis aims to explore and analyze a Credit Score dataset to gain insights into customer creditworthiness and segmentation. The dataset contains information on various factors that influence credit scores, such as payment history, credit utilization ratio, number of credit accounts, education level, and employment status. The analysis will utilize the k-means algorithm to perform clustering and identify distinct groups of customers based on their credit scores.

The Credit Score dataset comprises a collection of records, each representing an individual's credit profile. The features included in the dataset are as follows:

The data set Contains following all features:

**Description of All features **:

(1). Age: This feature represents the age of the individual.

(2). Gender: This feature captures the gender of the individual.

(3). Marital Status: This feature denotes the marital status of the individual.

(4). Education Level: This feature represents the highest level of education attained by the individual.

(5). Employment Status: This feature indicates the current employment status of the individual.

(6). Credit Utilization Ratio: This feature reflects the ratio of credit used by the individual compared to their total available credit limit.

(7). Payment History: It represents the monthly net payment behaviour of each customer, taking into account factors such as on-time payments, late payments, missed payments, and defaults.

(8). Number of Credit Accounts: It represents the count of active credit accounts the person holds.

(9). Loan Amount: It indicates the monetary value of the loan.

(10). Interest Rate: This feature represents the interest rate associated with the loan.

(11). Loan Term: This feature denotes the duration or term of the loan.

(12). Type of Loan: It includes categories like “Personal Loan,” “Auto Loan,” or potentially other types of loans.
Data from: Bank Loan
kaggle.com
zip
Updated Dec 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zahra Zolghadr (2023). Bank Loan [Dataset]. https://www.kaggle.com/datasets/zahrazolghadr/bank-loan
Explore at:
zip(12603 bytes)Available download formats
Dataset updated
Dec 24, 2023
Authors
Zahra Zolghadr
Description
In the area of financial decision-making, a dataset named "bank-loan" takes center stage, focusing on the critical domain of credit scoring. With a pool of 700 records derived from bank customers who successfully obtained loans and conscientiously repaid their installments, the dataset captures the repayment outcomes, categorized as 1 and 0 for default statuses. The overarching objective is to develop a robust credit scoring system, a discerning arbiter for loan approvals. This system will draw on various factors, including age, education, employment duration, tenure at the current residence, income levels, debit-to-income ratio, credit-to-debit ratio, and other debts reported at the time of loan application. By delving into the intricate details of these parameters, the aim is to construct a predictive model that empowers the financial institution to make informed decisions when considering loan applications, thereby optimizing risk management and ensuring the soundness of lending practices.

Age: Age in years.

Ed: 1-Did not complete high school 2-High school degree 3-Some college 4-College degree 5-Post-undergraduate degree

Employ: Years with current employer

Address: Years at current address

Income: Household income in thousands

Debtinc: Debt to income ratio (x100)

Creddebt: Credit card debt in thousands

Othdebt: Other debt in thousands

Default: The "Default" field is the target variable, indicating previously defaulted. It takes binary values, with 1 typically denoting a "bad" default status and 0 representing a "good" repayment history.
Data from: Loan Default Prediction
kaggle.com
zip
Updated Aug 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marc Buji (2022). Loan Default Prediction [Dataset]. https://www.kaggle.com/datasets/marcbuji/loan-default-prediction
Explore at:
zip(5389782 bytes)Available download formats
Dataset updated
Aug 15, 2022
Authors
Marc Buji
Description
Banks run into losses when a customer doesn't pay their loans on time. Because of this, every year, banks have losses in crores, and this also impacts the country's economic growth to a large extent. We look at various attributes such as funded amount, location, loan, balance, etc., to predict if a person will be a loan defaulter or not. To build a model to solve this problem, Grant Group Funding has a dataset of 87,501 rows and 30 columns based on a client in banking sector.

ID: Unique ID Asst_Reg: Value of all the assets registered under the borrowers name GGGrade: Grant Group Grade Experience: Total year of work experience of the borrower Validation: Validation status of the borrower Yearly Income : Total yearly income of the borrower Home Status: Borrower living status Unpaid 2 years : No. of times the Borrower has defaulted in last two years "Already Defaulted : Number of other loans the borrower was default" Designation : Designation of Borrower Debt to Income : Debt to Income ratio Postal Code : Postal code of borrower Lend Amount : Total funded amount to borrower "Deprecatory Records: An entry that may be considered negative by lenders because it indicates risk and hurts your ability to qualify for credit or other services" Interest Charged : Interest charged on total amount Usage Rate: Processing Charges on the Loan Amount Inquiries: Inquiries in Last 6 Months Present Balance: Current balance in the borrower account Gross Collection: The gross amount payable by way of Settlement or judgment in respect of the Claims, excluding any costs Sub GGGrade: Sub Grant Group Grade File Status: Status of the loan file State: State to which borrower belong Account Open: Total number of open accounts in the name of Borrower Total Unpaid CL: Unpaid dues on all the other loans Duration: Duration for the amount is funded to borrower Unpaid Amount: Unpaid balance on the credit card Reason: Reason for loan application Claim Type: Amongst all Application type what is the borrower Claim Type I - Individual Account , J - Joint Account" Due Fee: Charges incurred if the payment on loan amount is delayed Loan/No Loan: Target Variable
UCI Credit Card(From Python WOE PKG)
kaggle.com
zip
Updated Apr 8, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WilsonF (2018). UCI Credit Card(From Python WOE PKG) [Dataset]. https://www.kaggle.com/datasets/wilsonf/uci-credit-carefrom-python-woe-pkg/discussion
Explore at:
zip(2114904 bytes)Available download formats
Dataset updated
Apr 8, 2018
Authors
WilsonF
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Information

This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

YOU SHOULD SPECIFY THE VARIABLES DTYPES WITH config.csv

Appointment:

continuous variables: is_tobe_bin=1 and is_candidate=1

discrete variables: is_tobe_bin=0 and is_candidate=1

Content

There are 25 variables:

ID: ID of each client

LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit

SEX: Gender (1=male, 2=female)

EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)

MARRIAGE: Marital status (1=married, 2=single, 3=others)

AGE: Age in years

PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)

PAY_2: Repayment status in August, 2005 (scale same as above)

PAY_3: Repayment status in July, 2005 (scale same as above)

PAY_4: Repayment status in June, 2005 (scale same as above)

PAY_5: Repayment status in May, 2005 (scale same as above)

PAY_6: Repayment status in April, 2005 (scale same as above)

BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)

BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)

BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)

BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)

BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)

BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)

PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)

PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)

PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)

PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)

PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)

PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)

default.payment.next.month: Default payment (1=yes, 0=no)

Our target

To make WoE Transformation for a ScoreCard Model for credit rating.

There is a github python package for use.

'https://github.com/boredbird/woe'
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ifeanyichukwu Nwobodo (2023). Credit Card Default Dataset [Dataset]. https://www.kaggle.com/datasets/ifeanyichukwunwobodo/credit-card-default

Data from: Credit Card Default Dataset

Data from a bank containing demographic and payment status data of customers

Explore at:

zip(1126400 bytes)Available download formats

Dataset updated

Apr 30, 2023

Authors

Ifeanyichukwu Nwobodo

License

ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically

Description

The decision to give credit to a particular borrower is a very important decision for various financial institutions as this affects their revenue and profit. There is always a risk of default (not paying), this risk can be reduced by using data to identify the potential customers who will pay back and the ones who will default on their loan.

This dataset contains demographic and payment status data from a bank. The dataset can be used to practice and hone your exploratory data analysis and machine learning skills

Clear search

Close search

Google apps

Main menu

Data from: Credit Card Default Dataset

Default on Their Credit Card

Data from: Credit Card Default

Context

Acknowledgements

Default of Credit Card Clients Data Set

Source:

Data Set Information:

Attribute Information:

Relevant Papers:

Citation Request:

Default of Credit Card Clients Dataset

Dataset Information

Content

Inspiration

Acknowledgements

Credit_Scoring_Data

**Description of All features **:

Data from: Bank Loan

Data from: Loan Default Prediction

UCI Credit Card(From Python WOE PKG)

Dataset Information

Appointment:

Content

Our target

Data from: Credit Card Default Dataset

Data from a bank containing demographic and payment status data of customers

Description of All features :