Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
The decision to give credit to a particular borrower is a very important decision for various financial institutions as this affects their revenue and profit. There is always a risk of default (not paying), this risk can be reduced by using data to identify the potential customers who will pay back and the ones who will default on their loan.
This dataset contains demographic and payment status data from a bank. The dataset can be used to practice and hone your exploratory data analysis and machine learning skills
Facebook
TwitterAbstract: This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods.
Source: UCI Machine Learning Repository
Data Set Information: This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel “Sorting Smoothing Method†to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.
Attribute Information:
This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables: X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005. X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.
Facebook
TwitterThis research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel “Sorting Smoothing Method†to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.
Name: I-Cheng Yeh email addresses: (1) icyeh '@' chu.edu.tw (2) 140910 '@' mail.tku.edu.tw institutions: (1) Department of Information Management, Chung Hua University, Taiwan. (2) Department of Civil Engineering, Tamkang University, Taiwan. other contact information: 886-2-26215656 ext. 3181We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Name: I-Cheng Yeh email addresses: (1) icyeh '@' chu.edu.tw (2) 140910 '@' mail.tku.edu.tw institutions: (1) Department of Information Management, Chung Hua University, Taiwan. (2) Department of Civil Engineering, Tamkang University, Taiwan. other contact information: 886-2-26215656 ext. 3181
This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel “Sorting Smoothing Method†to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.
This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables: X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005. X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.
Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.
Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.
There are 25 variables:
Some ideas for exploration:
Any publications based on this dataset should acknowledge the following:
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
The original dataset can be found here at the UCI Machine Learning Repository.
Facebook
TwitterIntroduction:
This dataset analysis aims to explore and analyze a Credit Score dataset to gain insights into customer creditworthiness and segmentation. The dataset contains information on various factors that influence credit scores, such as payment history, credit utilization ratio, number of credit accounts, education level, and employment status. The analysis will utilize the k-means algorithm to perform clustering and identify distinct groups of customers based on their credit scores.
The Credit Score dataset comprises a collection of records, each representing an individual's credit profile. The features included in the dataset are as follows:
The data set Contains following all features:
(1). Age: This feature represents the age of the individual.
(2). Gender: This feature captures the gender of the individual.
(3). Marital Status: This feature denotes the marital status of the individual.
(4). Education Level: This feature represents the highest level of education attained by the individual.
(5). Employment Status: This feature indicates the current employment status of the individual.
(6). Credit Utilization Ratio: This feature reflects the ratio of credit used by the individual compared to their total available credit limit.
(7). Payment History: It represents the monthly net payment behaviour of each customer, taking into account factors such as on-time payments, late payments, missed payments, and defaults.
(8). Number of Credit Accounts: It represents the count of active credit accounts the person holds.
(9). Loan Amount: It indicates the monetary value of the loan.
(10). Interest Rate: This feature represents the interest rate associated with the loan.
(11). Loan Term: This feature denotes the duration or term of the loan.
(12). Type of Loan: It includes categories like “Personal Loan,” “Auto Loan,” or potentially other types of loans.
Facebook
TwitterIn the area of financial decision-making, a dataset named "bank-loan" takes center stage, focusing on the critical domain of credit scoring. With a pool of 700 records derived from bank customers who successfully obtained loans and conscientiously repaid their installments, the dataset captures the repayment outcomes, categorized as 1 and 0 for default statuses. The overarching objective is to develop a robust credit scoring system, a discerning arbiter for loan approvals. This system will draw on various factors, including age, education, employment duration, tenure at the current residence, income levels, debit-to-income ratio, credit-to-debit ratio, and other debts reported at the time of loan application. By delving into the intricate details of these parameters, the aim is to construct a predictive model that empowers the financial institution to make informed decisions when considering loan applications, thereby optimizing risk management and ensuring the soundness of lending practices.
Age: Age in years.
Ed: 1-Did not complete high school 2-High school degree 3-Some college 4-College degree 5-Post-undergraduate degree
Employ: Years with current employer
Address: Years at current address
Income: Household income in thousands
Debtinc: Debt to income ratio (x100)
Creddebt: Credit card debt in thousands
Othdebt: Other debt in thousands
Default: The "Default" field is the target variable, indicating previously defaulted. It takes binary values, with 1 typically denoting a "bad" default status and 0 representing a "good" repayment history.
Facebook
TwitterBanks run into losses when a customer doesn't pay their loans on time. Because of this, every year, banks have losses in crores, and this also impacts the country's economic growth to a large extent. We look at various attributes such as funded amount, location, loan, balance, etc., to predict if a person will be a loan defaulter or not. To build a model to solve this problem, Grant Group Funding has a dataset of 87,501 rows and 30 columns based on a client in banking sector.
ID: Unique ID Asst_Reg: Value of all the assets registered under the borrowers name GGGrade: Grant Group Grade Experience: Total year of work experience of the borrower Validation: Validation status of the borrower Yearly Income : Total yearly income of the borrower Home Status: Borrower living status Unpaid 2 years : No. of times the Borrower has defaulted in last two years "Already Defaulted : Number of other loans the borrower was default" Designation : Designation of Borrower Debt to Income : Debt to Income ratio Postal Code : Postal code of borrower Lend Amount : Total funded amount to borrower "Deprecatory Records: An entry that may be considered negative by lenders because it indicates risk and hurts your ability to qualify for credit or other services" Interest Charged : Interest charged on total amount Usage Rate: Processing Charges on the Loan Amount Inquiries: Inquiries in Last 6 Months Present Balance: Current balance in the borrower account Gross Collection: The gross amount payable by way of Settlement or judgment in respect of the Claims, excluding any costs Sub GGGrade: Sub Grant Group Grade File Status: Status of the loan file State: State to which borrower belong Account Open: Total number of open accounts in the name of Borrower Total Unpaid CL: Unpaid dues on all the other loans Duration: Duration for the amount is funded to borrower Unpaid Amount: Unpaid balance on the credit card Reason: Reason for loan application Claim Type: Amongst all Application type what is the borrower Claim Type I - Individual Account , J - Joint Account" Due Fee: Charges incurred if the payment on loan amount is delayed Loan/No Loan: Target Variable
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.
YOU SHOULD SPECIFY THE VARIABLES DTYPES WITH config.csv
continuous variables: is_tobe_bin=1 and is_candidate=1
discrete variables: is_tobe_bin=0 and is_candidate=1
There are 25 variables:
ID: ID of each client
LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit
SEX: Gender (1=male, 2=female)
EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)
MARRIAGE: Marital status (1=married, 2=single, 3=others)
AGE: Age in years
PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)
PAY_2: Repayment status in August, 2005 (scale same as above)
PAY_3: Repayment status in July, 2005 (scale same as above)
PAY_4: Repayment status in June, 2005 (scale same as above)
PAY_5: Repayment status in May, 2005 (scale same as above)
PAY_6: Repayment status in April, 2005 (scale same as above)
BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)
BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)
BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)
BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)
BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)
BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)
PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)
PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)
PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)
PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)
PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)
PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)
default.payment.next.month: Default payment (1=yes, 0=no)
To make WoE Transformation for a ScoreCard Model for credit rating.
There is a github python package for use.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
The decision to give credit to a particular borrower is a very important decision for various financial institutions as this affects their revenue and profit. There is always a risk of default (not paying), this risk can be reduced by using data to identify the potential customers who will pay back and the ones who will default on their loan.
This dataset contains demographic and payment status data from a bank. The dataset can be used to practice and hone your exploratory data analysis and machine learning skills