9 datasets found
  1. Data from: Credit Card Default Dataset

    • kaggle.com
    zip
    Updated Apr 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ifeanyichukwu Nwobodo (2023). Credit Card Default Dataset [Dataset]. https://www.kaggle.com/datasets/ifeanyichukwunwobodo/credit-card-default
    Explore at:
    zip(1126400 bytes)Available download formats
    Dataset updated
    Apr 30, 2023
    Authors
    Ifeanyichukwu Nwobodo
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    The decision to give credit to a particular borrower is a very important decision for various financial institutions as this affects their revenue and profit. There is always a risk of default (not paying), this risk can be reduced by using data to identify the potential customers who will pay back and the ones who will default on their loan.

    This dataset contains demographic and payment status data from a bank. The dataset can be used to practice and hone your exploratory data analysis and machine learning skills

  2. Default on Their Credit Card

    • kaggle.com
    zip
    Updated Apr 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    reel jojo (2023). Default on Their Credit Card [Dataset]. https://www.kaggle.com/datasets/reeljojo/default-on-their-credit-card
    Explore at:
    zip(2513923 bytes)Available download formats
    Dataset updated
    Apr 9, 2023
    Authors
    reel jojo
    Description

    Abstract: This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods.

    Source: UCI Machine Learning Repository

    Data Set Information: This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel “Sorting Smoothing Method†to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.

    Attribute Information:

    This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables: X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005. X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.

  3. Data from: Credit Card Default

    • kaggle.com
    zip
    Updated Feb 17, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy_M (2020). Credit Card Default [Dataset]. https://www.kaggle.com/datasets/arindam235/credit-card-default
    Explore at:
    zip(1484551 bytes)Available download formats
    Dataset updated
    Feb 17, 2020
    Authors
    Andy_M
    Description

    Context

    This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel “Sorting Smoothing Method†to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.

    Acknowledgements

    Name: I-Cheng Yeh email addresses: (1) icyeh '@' chu.edu.tw (2) 140910 '@' mail.tku.edu.tw institutions: (1) Department of Information Management, Chung Hua University, Taiwan. (2) Department of Civil Engineering, Tamkang University, Taiwan. other contact information: 886-2-26215656 ext. 3181We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

  4. Default of Credit Card Clients Data Set

    • kaggle.com
    zip
    Updated Apr 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bojan Tunguz (2021). Default of Credit Card Clients Data Set [Dataset]. https://www.kaggle.com/tunguz/default-of-credit-card-clients-data-set
    Explore at:
    zip(1028993 bytes)Available download formats
    Dataset updated
    Apr 14, 2021
    Authors
    Bojan Tunguz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Source:

    Name: I-Cheng Yeh email addresses: (1) icyeh '@' chu.edu.tw (2) 140910 '@' mail.tku.edu.tw institutions: (1) Department of Information Management, Chung Hua University, Taiwan. (2) Department of Civil Engineering, Tamkang University, Taiwan. other contact information: 886-2-26215656 ext. 3181

    Data Set Information:

    This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel “Sorting Smoothing Method†to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.

    Attribute Information:

    This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables: X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005. X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.

    Relevant Papers:

    Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.

    Citation Request:

    Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.

  5. Default of Credit Card Clients Dataset

    • kaggle.com
    zip
    Updated Nov 3, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning (2016). Default of Credit Card Clients Dataset [Dataset]. https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset/discussion
    Explore at:
    zip(1025318 bytes)Available download formats
    Dataset updated
    Nov 3, 2016
    Dataset authored and provided by
    UCI Machine Learning
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Information

    This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

    Content

    There are 25 variables:

    • ID: ID of each client
    • LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit
    • SEX: Gender (1=male, 2=female)
    • EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)
    • MARRIAGE: Marital status (1=married, 2=single, 3=others)
    • AGE: Age in years
    • PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)
    • PAY_2: Repayment status in August, 2005 (scale same as above)
    • PAY_3: Repayment status in July, 2005 (scale same as above)
    • PAY_4: Repayment status in June, 2005 (scale same as above)
    • PAY_5: Repayment status in May, 2005 (scale same as above)
    • PAY_6: Repayment status in April, 2005 (scale same as above)
    • BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)
    • BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)
    • BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)
    • BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)
    • BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)
    • BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)
    • PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)
    • PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)
    • PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)
    • PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)
    • PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)
    • PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)
    • default.payment.next.month: Default payment (1=yes, 0=no)

    Inspiration

    Some ideas for exploration:

    1. How does the probability of default payment vary by categories of different demographic variables?
    2. Which variables are the strongest predictors of default payment?

    Acknowledgements

    Any publications based on this dataset should acknowledge the following:

    Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

    The original dataset can be found here at the UCI Machine Learning Repository.

  6. Credit_Scoring_Data

    • kaggle.com
    Updated Aug 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AdityaRaj Sharma (2023). Credit_Scoring_Data [Dataset]. https://www.kaggle.com/datasets/cs49adityarajsharma/credit-scoring-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 5, 2023
    Dataset provided by
    Kaggle
    Authors
    AdityaRaj Sharma
    Description

    Introduction:

    This dataset analysis aims to explore and analyze a Credit Score dataset to gain insights into customer creditworthiness and segmentation. The dataset contains information on various factors that influence credit scores, such as payment history, credit utilization ratio, number of credit accounts, education level, and employment status. The analysis will utilize the k-means algorithm to perform clustering and identify distinct groups of customers based on their credit scores.

    The Credit Score dataset comprises a collection of records, each representing an individual's credit profile. The features included in the dataset are as follows:

    The data set Contains following all features:

    **Description of All features **:

    (1). Age: This feature represents the age of the individual.

    (2). Gender: This feature captures the gender of the individual.

    (3). Marital Status: This feature denotes the marital status of the individual.

    (4). Education Level: This feature represents the highest level of education attained by the individual.

    (5). Employment Status: This feature indicates the current employment status of the individual.

    (6). Credit Utilization Ratio: This feature reflects the ratio of credit used by the individual compared to their total available credit limit.

    (7). Payment History: It represents the monthly net payment behaviour of each customer, taking into account factors such as on-time payments, late payments, missed payments, and defaults.

    (8). Number of Credit Accounts: It represents the count of active credit accounts the person holds.

    (9). Loan Amount: It indicates the monetary value of the loan.

    (10). Interest Rate: This feature represents the interest rate associated with the loan.

    (11). Loan Term: This feature denotes the duration or term of the loan.

    (12). Type of Loan: It includes categories like “Personal Loan,” “Auto Loan,” or potentially other types of loans.

  7. Data from: Bank Loan

    • kaggle.com
    zip
    Updated Dec 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zahra Zolghadr (2023). Bank Loan [Dataset]. https://www.kaggle.com/datasets/zahrazolghadr/bank-loan
    Explore at:
    zip(12603 bytes)Available download formats
    Dataset updated
    Dec 24, 2023
    Authors
    Zahra Zolghadr
    Description

    In the area of financial decision-making, a dataset named "bank-loan" takes center stage, focusing on the critical domain of credit scoring. With a pool of 700 records derived from bank customers who successfully obtained loans and conscientiously repaid their installments, the dataset captures the repayment outcomes, categorized as 1 and 0 for default statuses. The overarching objective is to develop a robust credit scoring system, a discerning arbiter for loan approvals. This system will draw on various factors, including age, education, employment duration, tenure at the current residence, income levels, debit-to-income ratio, credit-to-debit ratio, and other debts reported at the time of loan application. By delving into the intricate details of these parameters, the aim is to construct a predictive model that empowers the financial institution to make informed decisions when considering loan applications, thereby optimizing risk management and ensuring the soundness of lending practices.

    Age: Age in years.

    Ed: 1-Did not complete high school 2-High school degree 3-Some college 4-College degree 5-Post-undergraduate degree

    Employ: Years with current employer

    Address: Years at current address

    Income: Household income in thousands

    Debtinc: Debt to income ratio (x100)

    Creddebt: Credit card debt in thousands

    Othdebt: Other debt in thousands

    Default: The "Default" field is the target variable, indicating previously defaulted. It takes binary values, with 1 typically denoting a "bad" default status and 0 representing a "good" repayment history.

  8. Data from: Loan Default Prediction

    • kaggle.com
    zip
    Updated Aug 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marc Buji (2022). Loan Default Prediction [Dataset]. https://www.kaggle.com/datasets/marcbuji/loan-default-prediction
    Explore at:
    zip(5389782 bytes)Available download formats
    Dataset updated
    Aug 15, 2022
    Authors
    Marc Buji
    Description

    Banks run into losses when a customer doesn't pay their loans on time. Because of this, every year, banks have losses in crores, and this also impacts the country's economic growth to a large extent. We look at various attributes such as funded amount, location, loan, balance, etc., to predict if a person will be a loan defaulter or not. To build a model to solve this problem, Grant Group Funding has a dataset of 87,501 rows and 30 columns based on a client in banking sector.

    ID: Unique ID Asst_Reg: Value of all the assets registered under the borrowers name GGGrade: Grant Group Grade Experience: Total year of work experience of the borrower Validation: Validation status of the borrower Yearly Income : Total yearly income of the borrower Home Status: Borrower living status Unpaid 2 years : No. of times the Borrower has defaulted in last two years "Already Defaulted : Number of other loans the borrower was default" Designation : Designation of Borrower Debt to Income : Debt to Income ratio Postal Code : Postal code of borrower Lend Amount : Total funded amount to borrower "Deprecatory Records: An entry that may be considered negative by lenders because it indicates risk and hurts your ability to qualify for credit or other services" Interest Charged : Interest charged on total amount Usage Rate: Processing Charges on the Loan Amount Inquiries: Inquiries in Last 6 Months Present Balance: Current balance in the borrower account Gross Collection: The gross amount payable by way of Settlement or judgment in respect of the Claims, excluding any costs Sub GGGrade: Sub Grant Group Grade File Status: Status of the loan file State: State to which borrower belong Account Open: Total number of open accounts in the name of Borrower Total Unpaid CL: Unpaid dues on all the other loans Duration: Duration for the amount is funded to borrower Unpaid Amount: Unpaid balance on the credit card Reason: Reason for loan application Claim Type: Amongst all Application type what is the borrower Claim Type I - Individual Account , J - Joint Account" Due Fee: Charges incurred if the payment on loan amount is delayed Loan/No Loan: Target Variable

  9. UCI Credit Card(From Python WOE PKG)

    • kaggle.com
    zip
    Updated Apr 8, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WilsonF (2018). UCI Credit Card(From Python WOE PKG) [Dataset]. https://www.kaggle.com/datasets/wilsonf/uci-credit-carefrom-python-woe-pkg/discussion
    Explore at:
    zip(2114904 bytes)Available download formats
    Dataset updated
    Apr 8, 2018
    Authors
    WilsonF
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Information

    This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

    YOU SHOULD SPECIFY THE VARIABLES DTYPES WITH config.csv

    Appointment:

    continuous variables: is_tobe_bin=1 and is_candidate=1

    discrete variables: is_tobe_bin=0 and is_candidate=1

    Content

    There are 25 variables:

    ID: ID of each client

    LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit

    SEX: Gender (1=male, 2=female)

    EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)

    MARRIAGE: Marital status (1=married, 2=single, 3=others)

    AGE: Age in years

    PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)

    PAY_2: Repayment status in August, 2005 (scale same as above)

    PAY_3: Repayment status in July, 2005 (scale same as above)

    PAY_4: Repayment status in June, 2005 (scale same as above)

    PAY_5: Repayment status in May, 2005 (scale same as above)

    PAY_6: Repayment status in April, 2005 (scale same as above)

    BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)

    BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)

    BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)

    BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)

    BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)

    BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)

    PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)

    PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)

    PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)

    PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)

    PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)

    PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)

    default.payment.next.month: Default payment (1=yes, 0=no)

    Our target

    To make WoE Transformation for a ScoreCard Model for credit rating.

    There is a github python package for use.

    'https://github.com/boredbird/woe'

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ifeanyichukwu Nwobodo (2023). Credit Card Default Dataset [Dataset]. https://www.kaggle.com/datasets/ifeanyichukwunwobodo/credit-card-default
Organization logo

Data from: Credit Card Default Dataset

Data from a bank containing demographic and payment status data of customers

Related Article
Explore at:
zip(1126400 bytes)Available download formats
Dataset updated
Apr 30, 2023
Authors
Ifeanyichukwu Nwobodo
License

ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically

Description

The decision to give credit to a particular borrower is a very important decision for various financial institutions as this affects their revenue and profit. There is always a risk of default (not paying), this risk can be reduced by using data to identify the potential customers who will pay back and the ones who will default on their loan.

This dataset contains demographic and payment status data from a bank. The dataset can be used to practice and hone your exploratory data analysis and machine learning skills

Search
Clear search
Close search
Google apps
Main menu