This dataset was created by Anoop E R
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Lending Club offers peer-to-peer (P2P) loans through a technological platform for various personal finance purposes and is today one of the companies that dominate the US P2P lending market. The original dataset is publicly available on Kaggle and corresponds to all the loans issued by Lending Club between 2007 and 2018. The present version of the dataset is for constructing a granting model, that is, a model designed to make decisions on whether to grant a loan based on information available at the time of the loan application. Consequently, our dataset only has a selection of variables from the original one, which are the variables known at the moment the loan request is made. Furthermore, the target variable of a granting model represents the final status of the loan, that are "default" or "fully paid". Thus, we filtered out from the original dataset all the loans in transitory states. Our dataset comprises 1,347,681 records or obligations (approximately 60% of the original) and it was also cleaned for completeness and consistency (less than 1% of our dataset was filtered out).
TARGET VARIABLE
The dataset includes a target variable based on the final resolution of the credit: the default category corresponds to the event charged off and the non-default category to the event fully paid. It does not consider other values in the loan status variable since this variable represents the state of the loan at the end of the considered time window. Thus, there are no loans in transitory states. The original dataset includes the target variable “loan status”, which contains several categories ('Fully Paid', 'Current', 'Charged Off', 'In Grace Period', 'Late (31-120 days)', 'Late (16-30 days)', 'Default'). However, in our dataset, we just consider loans that are either “Fully Paid” or “Default” and transform this variable into a binary variable called “Default”, with a 0 for fully paid loans and a 1 for defaulted loans.
EXPLANATORY VARIABLES
The explanatory variables that we use correspond only to the information available at the time of the application. Variables such as the interest rate, grade, or subgrade are generated by the company as a result of a credit risk assessment process, so they were filtered out from the dataset as they must not be considered in risk models to predict the default in granting of credit.
Loan identification variables:
id: Loan id (unique identifier).
issue_d: Month and year in which the loan was approved.
Quantitative variables:
revenue: Borrower's self-declared annual income during registration.
dti_n: Indebtedness ratio for obligations excluding mortgage. Monthly information. This ratio has been calculated considering the indebtedness of the whole group of applicants. It is estimated as the ratio calculated using the co-borrowers’ total payments on the total debt obligations divided by the co-borrowers’ combined monthly income.
loan_amnt: Amount of credit requested by the borrower.
fico_n: Defined between 300 and 850, reported by Fair Isaac Corporation as a risk measure based on historical credit information reported at the time of application. This value has been calculated as the average of the variables “fico_range_low” and “fico_range_high” in the original dataset.
experience_c: Binary variable that indicates whether the borrower is new to the entity. This variable is constructed from the credit date of the previous obligation in LC and the credit date of the current obligation; if the difference between dates is positive, it is not considered as a new experience with LC.
Categorical variables:
emp_length: Categorical variable with the employment length of the borrower (includes the no information category)
purpose: Credit purpose category for the loan request.
home_ownership_n: Homeownership status provided by the borrower in the registration process. Categories defined by LC: “mortgage”, “rent”, “own”, “other”, “any”, “none”. We merged the categories “other”, “any” and “none” as “other”.
addr_state: Borrower's residence state from the USA.
zip_code: Zip code of the borrower's residence.
Textual variables
title: Title of the credit request description provided by the borrower.
desc: Description of the credit request provided by the borrower.
We cleaned the textual variables. First, we removed all those descriptions that contained the default description provided by Lending Club on its web form (“Tell your story. What is your loan for?”). Moreover, we removed the prefix “Borrower added on DD/MM/YYYY >” from the descriptions to avoid any temporal background on them. Finally, as these descriptions came from a web form, we substituted all the HTML elements by their character (e.g. “&” was substituted by “&”, “<” was substituted by “<”, etc.).
This dataset has been used in the following academic articles:
This dataset was created by Debdatta Chatterjee
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is a modified version of the Kaggle Lending Club dataset found at https://www.kaggle.com/datasets/wordsforthewise/lending-club, including a model trained on the training set.
The data contains 2007 through 2018 Lending Club accepted and rejected loan data.
The dataset is licenced under CC0 1.0 Universal (CC0 1.0) Public Domain Dedication https://creativecommons.org/publicdomain/zero/1.0/
These tables provide additional detail on the loan assets of U.S. depository institutions by reporting mortgage and consumer loan portfolios broken down by the banks' estimates of the probability of default, as defined below. This information facilitates analysis of the potential concentration of risk in specific loan categories. The institutions reporting this information are generally those with $10 billion or more of assets.
Financial institutions incur significant losses due to the default of vehicle loans. This has led to the tightening up of vehicle loan underwriting and increased vehicle loan rejection rates. The need for a better credit risk scoring model is also raised by these institutions. This warrants a study to estimate the determinants of vehicle loan default. A financial institution has hired you to accurately predict the probability of loanee/borrower defaulting on a vehicle loan in the first EMI (Equated Monthly Instalments) on the due date. Following Information regarding the loan and loanee are provided in the datasets: Loanee Information (Demographic data like age, Identity proof etc.) Loan Information (Disbursal details, loan to value ratio etc.) Bureau data & history (Bureau score, number of active accounts, the status of other loans, credit history etc.) Doing so will ensure that clients capable of repayment are not rejected and important determinants can be identified which can be further used for minimising the default rates.
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Loan application example, configuration 4 Parent item: Loan application example A collection of artificial event logs describing 4 variants of a simple loan application process. Variant 1 is the most complex process with parallelism and choices. The other 3 variants have a simpler, more sequential, control flow and some activities of variant 1 are missing or split into 2. These event logs are used to test different approaches of discovering a configurable process model from a collection of event logs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit scorecards are essential tools for banks to assess the creditworthiness of loan applicants. While advanced machine learning models like XGBoost and random forest often outperform traditional logistic regression in predictive accuracy, their lack of interpretability hinders their adoption in practice. This study bridges the gap between research and practice by developing a novel framework for constructing interpretable credit scorecards using Shapley values. We apply this framework to two credit datasets, discretizing numerical variables and utilizing one-hot encoding to facilitate model development. Shapley values are then employed to derive credit scores for each predictor variable group in XGBoost, random forest, LightGBM, and CatBoost models. Our results demonstrate that this approach yields credit scorecards with interpretability comparable to logistic regression while maintaining superior predictive accuracy. This framework offers a practical and effective solution for credit practitioners seeking to leverage the power of advanced models without sacrificing transparency and regulatory compliance.
Data for the study has been retrieved from a publicly available data set of a leading European P2P lending platform, Bondora (https://www.bondora.com/en). The retrieved data is a pool of both defaulted and non-defaulted loans from the time period between 1st March 2009 and 27th January 2020. The data comprises demographic and financial information of borrowers and loan transactions. In P2P lending, loans are typically uncollateralized and lenders seek higher returns as compensation for the financial risk they take. In addition, they need to make decisions under information asymmetry that works in favor of the borrowers. In order to make rational decisions, lenders want to minimize the risk of default of each lending decision and realize the return that compensates for the risk.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study explores the potential of utilizing alternative data sources to enhance the accuracy of credit scoring models, compared to relying solely on traditional data sources, such as credit bureau data. A comprehensive dataset from the Home Credit Group’s home loan portfolio is analysed. The research examines the impact of incorporating alternative predictors that are typically overlooked, such as an applicant’s social network default status, regional economic ratings, and local population characteristics. The modelling approach applies the model-X knockoffs framework for systematic variable selection. By including these alternative data sources, the credit scoring models demonstrate improved predictive performance, achieving an area under the curve metric of 0.79360 on the Kaggle Home Credit default risk competition dataset, outperforming models that relied solely on traditional data sources, such as credit bureau data. The findings highlight the significance of leveraging diverse, non-traditional data sources to augment credit risk assessment capabilities and overall model accuracy.
https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
The global Auto Loan Origination Software market size was valued at USD 1.71 billion in 2023 and is projected to expand at a CAGR of 7.17% from 2023 to 2033, reaching a value of USD 3.48 billion by 2033. The increasing adoption of digital lending platforms, the rising demand for auto financing, and the growing popularity of online loan applications are key factors driving the growth of the market. The market is segmented into various categories based on deployment type, business model, end-use sector, loan type, and features. Cloud-based deployment is gaining popularity due to its scalability, cost-effectiveness, and ease of integration with other systems. SaaS is the most widely adopted business model, as it allows businesses to access software on a subscription basis without the need for upfront capital investment. Banks and credit unions are the dominant end-use sectors, followed by FinTech companies and independent lenders. New auto loans are the most common loan type, while automated decisioning and document management are key features offered by auto loan origination software solutions. Key players in the market include Finastra, Byte Software, Hyundai Capital America, DealerSocket, and Novantas. Key drivers for this market are: Integration with AI Cloudbased deployment Mobilefirst lending Personalized experiences Datadriven insights. Potential restraints include: Increased digitalization adoption of AI and automation growing demand for personalized lending regulatory compliance rising penetration of mobile devices.
This dataset was created by jasonruan2022
Many people assume that poor credit scores translate to higher interest rates. But is this assumption true? Follow Jonathan Blum, New York author and journalist, as he attempts to answer this question using GIS. In this lesson, you'll map variations in online loan interest rates. Then, you'll use regression analysis to build a predictive model, quantifying the relationship between interest rates and loan grade rankings.
This workflow can be used to map and measure the correlation between any two variables. It's perfect for anyone interested in regression analysis in ArcGIS Pro.
In this lesson you will build skills in these areas:
Learn ArcGIS is a hands-on, problem-based learning website using real-world scenarios. Our mission is to encourage critical thinking, and to develop resources that support STEM education.
The Porsche Taycan was the electric vehicle with the highest average loan payment in the United States as of the fourth quarter of 2021, at over 1,800 U.S. dollars. It was followed by the Audi e-tron GT, at just under 1,700 U.S. dollars. The Porsche Taycan was the eighth best-selling electric car model in the United States in 2021. The Tesla Model Y, the 2021 best-seller, had an average loan payment of 757 U.S. dollars.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Based on unique data we show that macro variables, the default rate and loss given default of bank loans share common cyclical components. The innovation in our model is the distinction between loans with either severe or mild losses. The variation in the proportion of these two types drives the cyclic behavior of the loss given default and constitutes the links with the default rate and macro variables. These links vary according to loan and borrower characteristics. During downturns, the proportion of defaults with severe losses increases, but the distribution of losses conditional on their being mild or severe does not change. although loans are monitored more closely than bonds and are more senior, the cyclical variation in their losses resembles those for bonds, albeit around a lower average level. This variation leads to an increase in the capital reserves required for loan portfolios.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global mortgage loan service market is experiencing robust growth, driven by factors such as increasing urbanization, rising disposable incomes, and favorable government policies promoting homeownership. The market, valued at approximately $2 trillion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 6% from 2025 to 2033. This expansion is fueled by a burgeoning demand for both residential and commercial mortgages, particularly in emerging economies with rapidly expanding middle classes. The residential segment currently dominates the market share, accounting for approximately 70%, with individual borrowers representing the largest application segment. However, the commercial estate and enterprise segments are witnessing significant growth, driven by increased corporate investments and infrastructural development. Key players like Rocket Mortgage, United Shore Financial Services, and Quicken Loans are leveraging technological advancements such as online platforms and AI-powered loan processing to enhance efficiency and customer experience, shaping the competitive landscape. The growth trajectory is expected to be influenced by fluctuating interest rates, macroeconomic conditions, and evolving regulatory frameworks. Nevertheless, the long-term outlook remains positive, underpinned by the fundamental drivers mentioned above. Technological advancements, particularly in fintech, are reshaping the mortgage loan service landscape. The rise of digital platforms, streamlined application processes, and enhanced data analytics are significantly improving accessibility and speed of loan approvals. This efficiency boost is leading to increased competition, encouraging lenders to offer more competitive interest rates and flexible repayment options to attract borrowers. Furthermore, the increasing adoption of alternative credit scoring models is broadening access to mortgage loans for previously underserved populations. Regional variations in market growth are expected, with North America and Asia-Pacific representing the largest markets. However, emerging economies in regions like South America and Africa hold significant potential for future growth, given the increasing demand for housing and infrastructural development within these markets. Geographic expansion and strategic partnerships remain key strategies for players aiming for market dominance within this evolving sector.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Model N reported $281.2M in Loan Capital for its fiscal quarter ending in March of 2024. Data for Model N | MODN - Loan Capital including historical, tables and charts were last updated by Trading Economics this last March in 2025.
This dataset provides a comprehensive monthly breakdown of mortgage loan data spanning from 1986 to 2019. Key metrics included are the contract interest rate, initial fees and charges, effective interest rate, term to maturity, mortgage loan amount, purchase price, and the loan-to-price ratio. The data offers valuable insights for analyzing trends in the mortgage market over more than three decades, making it a crucial resource for economists, financial analysts, and researchers interested in the evolution of housing finance. Additionally, the dataset can be used for predictive modeling and comparative market analysis.
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Commercial Loan Software Market size was valued at USD 15.09 Billion in 2024 and is projected to reach USD 29.19 Billion by 2031, growing at a CAGR of 8.6% from 2024 to 2031.
Commercial Loan Software Market Drivers
The increase in demand for self-service models to promote the collection process and the increasing need to offer customer-centric commercial loans are some of the other important driving factors expected to drive the Commercial Loan Software Market in the future. The Global Commercial Loan Software Market report provides a holistic evaluation of the market. The report offers a comprehensive analysis of key segments, trends, drivers, restraints, competitive landscape, and factors that are playing a substantial role in the market.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains realistic synthetic data generated with a commercial tool, taking as an input a real dataset of CaixaBank’s express loans for a timespan of 18 months. The real dataset was tagged in order to identify the confirmed and tentative fraud cases in which a fraudster has impersonate the client to claim that type of loan and steal client’s funds. The dataset includes several indicators that help fraud analysts to identify any suspicious behaviour of the user that could imply an impersonation or misbehaviour. This dataset was used in INFINITECH H2020 project to build an AI model for cyberfraud prevention in this type of operations, which are especially critical because of two factors. First, it is type of loan, an operation in which the fraudster can steal money that the client does not really own, so it can be stolen even from clients without funds on their accounts. Second, it is an operation that was offered to the clients to speed up the process of acquiring loans of small amounts. The fraudsters can take profit of that and proceed faster as well stealing that money. The detail of the data fields included in the dataset is specified in the table below.
Field name
Value example
Field description
Fraud
0
Indicates if a fraud was produced in the operation. (0 No; 1 Intent of fraud; 2 Completed fraud -money stolen-)
PK_ANYOMES
202102
Year and month of the loan constitution operation
PK_ANYOMESDIA
20210207
Day of the loan constitution operation
PK_TSINSERCION
06:28,0
Time of the loan constitution operation
IDE_USUCLO_ORIG
1321946400
User associated with the online banking contract and the client. It is an internal user ID which is used jointly with PK_CONTRATO to access the services under the online banking contract.
PK_CONTRATO
1096097250023219464
online banking contract code. It is the identifier of the online banking services.
FK_NUMPERSO
27388223
Unique ID that identifies the physical person (client) who is connecting to online banking
IDE_SAU
08875268
Identifier used by the client to access online banking. This identifier is used jointly with CARPETA id to access online banking services.
CARPETA
49830679
Folder the online banking services of the clients are stored. It is used jointly with the client's online banking identifier (ID_SAU).
FK_COD_OPERACION
03693
Loan constitution transaction code. Unique ID that identifies the loan.
DES_OPERACION
CONSTITUCION PRESTAMO
Description of the loan constitution operation.
IP_TERMINAL
AAHUAWPOTLXYxgaNLC zWp70Yp+MaW2i1qEkh0o=
IP of the terminal or hash of the mobile device from which the client connects to online banking.
FK_NUMPERSO_TIT_LOE
27388223
Identifier of the physical person that is the online banking contract holder. It can be different to FK_NUMPERSO, if FK_NUMPERSO is an authorised person to operate the online banking services of FK_NUMPERSO_TIT_LOE. It can happen both for FK_NUMPERSO_TIT_LOE representing physical or legal persons (enterprises).
FK_CONTRATO_PPAL_OPE
1001037520210005473
Contract code of the savings account in which the loan is deposited. This is not the same contract as the online banking contract.
FK_IMPORTE_PRINCIPAL
1500
Loan amount demanded.
IND_MFA_OPE
0
Indicator of the response of the SCA (Strong Customer Authentication) request decision algorithm for the loan consolidation operation. (0 No; 1 Yes; -1 Unknown)
MESSAGE_MFA_OPE
Konline bankingN USER AND DEVICE
SCA (Strong Customer Authentication) request decision algorithm response message for loan consolidation operation.
SALDO_ANTES_PRESTAMO
100
Balance of the account into which the loan is deposited just before the loan.
POSICION_GLOBAL_ANTES_PRESTAMO
1
Global balance of the client before the loan. (1: <1000; 2: 1000-10000; 3: 10000-50000; 4: 50000-250000; 5: >250000; -2: Data not found)
IND_NUEVO_IDE_SAU
0
If the identifier used to access online banking has been created in the last 48 hours. (0 No; 1 Yes; -1 Unknown)
FECHA_ALTA_CLIENTE
39246
Indicate the date of registration with CaixaBank as a customer. When the physical person (FK_NUMPERSO) became a client of CaixaBank
IND_ALTA_SIGN
0
Indicates if the client has registered a sign in the last 48 hours. (0 No; 1 Yes; -1 Unknown)
IND_GMP_ANT
0
Indicates if there has been a new primary mobile assignment in the 48 hours prior to the loan. (0 No; 1 Yes; -1 Unknown)
IND_INGRESO_NOMINA
1
Indicate if the payroll of FK_NUMPERSO is domiciled at CaixaBank. (0 No; 1 Yes)
IND_PENSION
0
Indicate if FK_NUMPERSO has the pension domiciled in CaixaBank. (0 No; 1 Yes)
IND_IMAGIN_BANK
1
Indicate if FK_NUMPERSO is ImaginBank customer (0 No; 1 Yes)
IND_EXTRANJERO
0
Indicate if FK_NUMPERSO is a foreigner (0 National; 1 Foreigner)
IND_RESIDENTE
1
Indicate if FK_NUMPERSO resides in Spain (0 No; 1 Yes)
FK_TIPREL
1
Type of the ownership of the savings account in which the loan is deposited (values between 1 and 48). 1 means it is an account holder. Other values mean other type of relationships (i.e. "authorized person but not an owner of the account").
FK_ORDREL
1
Order of the ownership relationship. If there are more than account holder, in which position is the FK_NUMPERSO.
This dataset was created by Anoop E R