Financial institutions incur significant losses due to the default of vehicle loans. This has led to the tightening up of vehicle loan underwriting and increased vehicle loan rejection rates. The need for a better credit risk scoring model is also raised by these institutions. This warrants a study to estimate the determinants of vehicle loan default. A financial institution has hired you to accurately predict the probability of loanee/borrower defaulting on a vehicle loan in the first EMI (Equated Monthly Instalments) on the due date. Following Information regarding the loan and loanee are provided in the datasets: Loanee Information (Demographic data like age, Identity proof etc.) Loan Information (Disbursal details, loan to value ratio etc.) Bureau data & history (Bureau score, number of active accounts, the status of other loans, credit history etc.) Doing so will ensure that clients capable of repayment are not rejected and important determinants can be identified which can be further used for minimising the default rates.
This dataset was created by Hareesh kay
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Overview This dataset contains 45,000 records of loan applicants, with various attributes related to personal demographics, financial status, and loan details. The dataset can be used for predictive modeling, particularly in credit risk assessment and loan default prediction.
Dataset Content The dataset includes 14 columns representing different factors influencing loan approvals and defaults:
Personal Information
person_age: Age of the applicant (in years). person_gender: Gender of the applicant (male, female). person_education: Educational background (High School, Bachelor, Master, etc.). person_income: Annual income of the applicant (in USD). person_emp_exp: Years of employment experience. person_home_ownership: Type of home ownership (RENT, OWN, MORTGAGE). Loan Details
loan_amnt: Loan amount requested (in USD). loan_intent: Purpose of the loan (PERSONAL, EDUCATION, MEDICAL, etc.). loan_int_rate: Interest rate on the loan (percentage). loan_percent_income: Ratio of loan amount to income. Credit & Loan History
cb_person_cred_hist_length: Length of the applicant's credit history (in years). credit_score: Credit score of the applicant. previous_loan_defaults_on_file: Whether the applicant has previous loan defaults (Yes or No). Target Variable
loan_status: 1 if the loan was repaid successfully, 0 if the applicant defaulted. Use Cases Loan Default Prediction: Build a classification model to predict loan repayment. Credit Risk Analysis: Analyze the relationship between income, credit score, and loan defaults. Feature Engineering: Extract new insights from employment history, home ownership, and loan amounts. Acknowledgments This dataset is synthetic and designed for machine learning and financial risk analysis.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Loan_Default_Risk_Expectancy_/main/loan.jpg" alt="">
Banks earn a major revenue from lending loans. But it is often associated with risk. The borrower's may default on the loan. To mitigate this issue, the banks have decided to use Machine Learning to overcome this issue. They have collected past data on the loan borrowers & would like you to develop a strong ML Model to classify if any new borrower is likely to default or not.
The dataset is enormous & consists of multiple deteministic factors like borrowe's income, gender, loan pupose etc. The dataset is subject to strong multicollinearity & empty values. Can you overcome these factors & build a strong classifier to predict defaulters?
This dataset has been referred from Kaggle.
This dataset was created by Luong151196@31
This dataset was created by Jonathan Wang
loan data provided by a Chinese vehicle loan agency. The institution’s borrowers often fall behind on payments or refuse to pay them, resulting in the institution’s high rate of non-performing loans. The institution would like to invite you to help them build a risk identification model to predict borrowers who may default(sensitive information has been desensitized)
loan_default indicates whether the borrower will fall behind on its payments
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description: Welcome to the "Loan Applicant Data for Credit Risk Analysis" dataset on Kaggle! This dataset provides essential information about loan applicants and their characteristics. Your task is to develop predictive models to determine the likelihood of loan default based on these simplified features.
In today's financial landscape, assessing credit risk is crucial for lenders and financial institutions. This dataset offers a simplified view of the factors that contribute to credit risk, making it an excellent opportunity for data scientists to apply their skills in machine learning and predictive modeling.
Column Descriptions:
Explore this dataset, preprocess the data as needed, and develop machine learning models, especially using Random Forest, to predict loan default. Your insights and solutions could contribute to better credit risk assessment methods and potentially help lenders make more informed decisions.
Remember to respect data privacy and ethics guidelines while working with this data. Good luck, and happy analyzing!
The data set is based upon https://www.kaggle.com/prateikmahendra/loan-data"> Lending Club Information .
- TheIrish Dummy Banks is a peer to peer lending bank based in the ireland, in which bank provide funds for potential borrowers and bank earn a profit depending on the risk they take (the borrowers credit score). Irish Fake bank provides loan to their loyal customers. The complete data set is borrowed from Lending Club For more basic information about the company please check out the wikipedia article about the company. This dataset is copied and clean from kaggle but it has been changed. The any kind of similarity is just for learning purposes. I dont have any intention for Plagiarism I just like to be clear myself.
<a src="https://en.wikipedia.org/wiki/Lending_Club"> Lending Club Information </a>
The central idea and coding is abstract from Kevin mark ham youtube video series, Introduction to machine learning with scikit-learn video series. You can find link under resources section.
LoanStatNew Description
addr_state The state provided by the borrower in the loan application
annual_inc The self-reported annual income provided by the borrower during registration.
annual_inc_joint The combined self-reported annual income provided by the co-borrowers during registration
application_type Indicates whether the loan is an individual application or a joint application with two co-borrowers
collection_recovery_fee post charge off collection fee
collections_12_mths_ex_med Number of collections in 12 months excluding medical collections
delinq_2yrs The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years
desc Loan description provided by the borrower
dti A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, - - - excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
dti_joint A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, - excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self-reported monthly income
earliest_cr_line The month the borrower's earliest reported credit line was opened
emp_length Employment length in years. Possible values are between 0 and 10 where 0 means less than one year
and 10 means ten or more years.
emp_title The job title supplied by the Borrower when applying for the loan.*
fico_range_high The upper boundary range the borrower’s FICO at loan origination belongs to.
fico_range_low The lower boundary range the borrower’s FICO at loan origination belongs to.
funded_amnt The total amount committed to that loan at that point in time.
funded_amnt_inv The total amount committed by investors for that loan at that point in time.
grade LC assigned loan grade
home_ownership The home ownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.
This dataset was created by Subbu
This is a synthetic dataset created using actual data from a financial institution. The data has been modified to remove identifiable features and the numbers transformed to ensure they do not link to original source (financial institution).
This is intended to be used for academic purposes for beginners who want to practice financial analytics from a simple financial dataset
This data set contains a customers and their account and loan details distributed in multiple data files.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains about 2.4 million rows. Some of the sensitive information has been encoded. The dataset required some data-cleaning process such as null values and outliers. The target column should be default
column.
This loan dataset is a good source to perform and practice credit risk analysis for loans. We should try to calculate probability of default using this dataset and use it to predict future default scenarios.
This dataset was created by Sheshank Joshi
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Personal Loan product is an unsecured loan therefore it is vital to assess the risk of the customers by checking their credit worthiness. This must be done to prevent loan defaults.
The objective is to build a Risk model using the dataset which will assess the risk of a customer defaulting after cross-selling the Personal Loan.
Column Descriptions: V1: Customer ID V2: If a customer has bounced in first EMI (1 : Bounced, 0 : Not bounced) V3: Number of times bounced in recent 12 months V4: Maximum MOB (Month of business with TVS Credit) V5: Number of times bounced while repaying the loan V6: EMI V7: Loan Amount V8: Tenure V9: Dealer codes from where customer has purchased the Two wheeler V10: Product code of Two wheeler (MC : Motorcycle , MO : Moped, SC : Scooter) V11: No of advance EMI paid V12: Rate of interest V13: Gender (Male/Female) V14: Employment type (HOUSEWIFE : housewife, SELF : Self-employed, SAL : Salaried, PENS : Pensioner, STUDENT : Student) V15: Resident type of customer V16: Date of birth V17: Age at which customer has taken the loan V18: Number of loans V19: Number of secured loans V20: Number of unsecured loans V21: Maximum amount sanctioned in the Live loans V22: Number of new loans in last 3 months V23: Total sanctioned amount in the secured Loans which are Live V24: Total sanctioned amount in the unsecured Loans which are Live V25: Maximum amount sanctioned for any Two wheeler loan V26: Time since last Personal loan taken (in months) V27: Time since first consumer durables loan taken (in months) V28: Number of times 30 days past due in last 6 months V29: Number of times 60 days past due in last 6 months V30: Number of times 90 days past due in last 3 months V31: Tier ; (Customer’s geographical location) V32: Target variable ( 1: Defaulters / 0: Non-Defaulters)
Data Dictionary:
Title: Credit data
Source: Credit One Bank
Number of Instances: 5000
Name of Dataset: Analysis_of_Default
Number of Attributes: 20 (7 numerical, 13 categorical)
Attribute description
Attribute 1: (Qualitative / Categorical) Status of existing checking account A11: ... < 0 USD A12: 0 <= ... < 10000 USD A13: ... >= 10000 USD A14: no checking account
Attribute 2: (numerical) Duration in month
Attribute 3: (Qualitative / Categorical) Credit history A30: no credits taken/all credits paid back duly A31: all credits at this bank paid back duly A32: existing credits paid back duly till now A33: delay in paying off in the past A34:critical account/other credits existing(not at this bank)
Attribute 4: (Qualitative / Categorical) Purpose A40: car (new) A41: car (used) A42: furniture/equipment A43: radio/television A44: domestic appliances A45: repairs A46: education A47: (vacation - does not exist?) A48: retraining A49: business A410: others
Attribute 5: (numerical) Credit amount
Attribute 6: (Qualitative / Categorical) Savings account/bonds A61: ... < 1000 USD A62: 1000 <= ... < 5000 USD A63: 5000 <= ... < 10000 USD A64: .. >= 10000 USD A65: unknown/ no savings account
Attribute 7: (Qualitative / Categorical)
Present employment since
A71: unemployed
A72: ... < 1 year
A73: 1 <= ... < 4 years
A74: 4 <= ... < 7 years
A75: .. >= 7 years
Attribute 8: (numerical) Installment rate in percentage of disposable income
Attribute 9: (Qualitative / Categorical) Personal status and sex A91: male : divorced/separated A92: female: divorced/separated/married A93: male : single A94: male : married/widowed A95: female: single
Attribute 10: (Qualitative / Categorical) Other debtors / guarantors A101: none A102: co-applicant A103: guarantor
Attribute 11: (numerical) Present residence since
Attribute 12: (Qualitative / Categorical) Property A121: real estate A122: if not A121: building society savings agreement/ life insurance A123: if not A121/A122: car or other, not in attribute 6 A124: unknown / no property
Attribute 13: (numerical) Age in years
Attribute 14: (Qualitative / Categorical) Other installment plans A141: bank A142: stores A143: none
Attribute 15: (Qualitative / Categorical) Housing A151: rent A152: own A153: for free
Attribute 16: (numerical) Number of existing credits at this bank
Attribute 17: (Qualitative / Categorical) Job A171: unemployed/ unskilled - non-resident A172: unskilled - resident A173: skilled employee / official A174: management/ self-employed/ highly qualified employee/ officer
Attribute 18: (numerical) Number of people being liable to provide maintenance for
Attribute 19: (Qualitative / Categorical) Telephone A191: none A192: yes, registered under the customer’s name
Attribute 20: (Qualitative / Categorical) foreign worker A201: yes A202: no
1 (Defaulted) 0 (No Default)
This dataset was created by Ali91Saif
The loan providing companies find it hard to give loans to the people due to their insufficient or non-existent credit history. Because of that, some consumers use it as their advantage by becoming a defaulter.
When the company receives a loan application, the company has to decide for loan approval based on the applicant’s profile. Two types of risks are associated with the bank’s decision:
a. If the applicant is likely to repay the loan, then not approving the loan results in a loss of business to the company
b. If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving the loan may lead to a financial loss for the company.
When a client applies for a loan, there are four types of decisions that could be taken by the client/company:
Approved: The Company has approved loan Application
Cancelled: The client cancelled the application sometime during approval. Either the client changed her/his mind about the loan or in some cases due to a higher risk of the client he received worse pricing which he did not want.
Refused: The company had rejected the loan (because the client does not meet their requirements etc.).
Unused offer: Loan has been cancelled by the client but on different stages of the process.
The objective is to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected.
A retail bank would like to hire you to build a credit default model for their credit card portfolio. The bank expects the model to identify the consumers who are likely to default on their credit card payments over the next 12 months. This model will be used to reduce the bank’s future losses. The bank is willing to provide you with some sample datathat they can currently extract from their systems. This data set (credit_data.csv) consists of 13,444 observations with 14 variables.
Based on the bank’s experience, the number of derogatory reports is a strong indicator of default. This is all that the information you are able to get from the bank at the moment. Currently, they do not have the expertise to provide any clarification on this data and are also unsure about other variables captured by their systems
Financial institutions incur significant losses due to the default of vehicle loans. This has led to the tightening up of vehicle loan underwriting and increased vehicle loan rejection rates. The need for a better credit risk scoring model is also raised by these institutions. This warrants a study to estimate the determinants of vehicle loan default. A financial institution has hired you to accurately predict the probability of loanee/borrower defaulting on a vehicle loan in the first EMI (Equated Monthly Instalments) on the due date. Following Information regarding the loan and loanee are provided in the datasets: Loanee Information (Demographic data like age, Identity proof etc.) Loan Information (Disbursal details, loan to value ratio etc.) Bureau data & history (Bureau score, number of active accounts, the status of other loans, credit history etc.) Doing so will ensure that clients capable of repayment are not rejected and important determinants can be identified which can be further used for minimising the default rates.