Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data contains an unquantified element of refinancing of existing mortgages (e.g. involving the redemption of an existing mortgage and its replacement with a mortgage from a different lender).
The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change.
The Federal Housing Administration's HECM program is the only government-insured reverse mortgage program. The HECM program guarantees that the lender will meet its payment obligations to the homeowner, limits the borrower's loan origination costs, and insures full repayment of the loan balance to the lender up to the maximum claim amount. The loan amount is based on borrower age, home value, and current interest rates. The HECM data files provide loan-level records that will enable interested parties to explore issues regarding downpayment assistance provided to homebuyers utilizing HECM insured mortgage financing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fixed 30-year mortgage rates in the United States averaged 6.77 percent in the week ending July 4 of 2025. This dataset provides the latest reported value for - United States MBA 30-Yr Mortgage Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mortgage Application in the United States increased by 9.40 percent in the week ending July 4 of 2025 over the previous week. This dataset provides - United States MBA Mortgage Applications - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Home Mortgage Disclosure Act (HMDA) requires many financial institutions to maintain, report, and publicly disclose loan-level information about mortgages. These data help show whether lenders are serving the housing needs of their communities; they give public officials information that helps them make decisions and policies; and they shed light on lending patterns that could be discriminatory. The public data are modified to protect applicant and borrower privacy.HMDA was originally enacted by Congress in 1975 and is implemented by Regulation C.
Banks run into losses when a customer doesn't pay their loans on time. Because of this, every year, banks have losses in crores, and this also impacts the country's economic growth to a large extent.We look at various attributes such as funded amount, location, loan, balance, etc., to predict if a person will be a loan defaulter or not.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
30 Year Mortgage Rate in the United States increased to 6.72 percent in July 10 from 6.67 percent in the previous week. This dataset includes a chart with historical data for the United States 30 Year Mortgage Rate.
loan.csv
:
In this file there are 18 columns:
loanId
: This is a unique loan identifier. Use this for joins with the payment.csv file anon_ssn
: This is a hash based on a client’s SSN (Anonymous ssn). You can use this as if it is a SSN to compare if a loan belongs to a previous customer.payFrequency
: This column represents repayment frequency of the loan:
B
is biweekly paymentsI
is irregularM
is monthlyS
is semi monthlyW
is weeklyapr
: Annual Percentage Rate of the loan (%)applicationDate
: Date of application (start date)originated
: Indicates if the loan has been initiated (underwriting process started).originatedDate
: Date of origination, day the loan was originatednPaidOff
: Number of MoneyLion loans previously paid off by the client.approved
: Indicates if the loan has been approved (final step of underwriting).isFunded
: Whether or not a loan is ultimately funded. a loan can be voided by a customer shortly after it is approved, so not all approved loans are ultimately funded.loanStatus
: Current loan status (this column is used for prediction). Most are selfexplanatory. Below are the statuses which need clarification:
Withdrawn Application
: The applicant has withdrawn their loan application before it was approved or funded.Paid Off Loan
: The loan has been fully paid off by the borrower according to the repayment terms.Rejected
: The loan application was rejected, typically due to failure to meet underwriting criteria.New Loan
: A newly approved loan that has not yet been funded.Internal Collection
: The loan is being managed and collected internally by MoneyLion due to missed payments or delinquency.CSR Voided New Loan
: A new loan application was voided by a customer service representative (CSR) before funding.External Collection
: The loan has been transferred to an external collection agency for management and collection.Returned Item
: A payment on the loan has been returned due to insufficient funds in the borrower's account.Customer Voided New Loan
: The borrower voided a new loan application before funding.Credit Return Void
: The loan was voided due to a credit return, typically related to a refunded transaction.Pending Paid Off
: The loan is in the process of being paid off, but the process is pending completion.Charged Off Paid Off
: The loan has been charged off as a loss by MoneyLion but has also been paid off by the borrower.Settled Bankruptcy
: The loan has been settled as part of a bankruptcy proceeding.Settlement Paid Off
: The loan has been paid off through a settlement agreement.Charged Off
: The loan has been charged off as a loss by MoneyLion due to nonpayment.Pending Rescind
: The loan is pending rescission, meaning it may be canceled or reversed.Customver Voided New Loan
: Typo: Likely should be "Customer Voided New Loan". Similar to "Customer Voided New Loan", indicating the borrower voided a new loan application before funding.Pending Application
: The loan application is pending review and approval.Voided New Loan
: The loan application was voided before funding.• Pending Application Fee: The loan application is pending due to the application fee not being paid.Settlement Pending Paid Off
: The loan is pending being paid off through a settlement agreement.loanAmount
: Principal amount of the loan ('Dollars') (for non-funded loans this will be the principal in the loan application)originallyScheduledPaymentAmount
: This is the Initialy scheduled repayment amount ('Dollars') (if a customer pays off all his scheduled payments, this is the amount we should receive)state
: State of the clientLead type
: The lead type determines the underwriting rules for a lead.
bvMandatory
: leads that are bought from the ping tree – required to perform bank verification before loan approvallead
: very similar to bvMandatory, except bank verification is optional for loan approvalcalifornia
: similar to lead, but optimized for California lending rulesorganic
: customers that came through the MoneyLion websiterc_returning
: customers who have at least 1 paid off loan in another loan portfolio. (The first paid off loan is not in this data set).prescreen
: preselected customers who have been offered a loan through direct mail campaignsexpress
: promotional “express” loansrepeat
: promotional loans offered through ...Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Source: From lending institutions
This data contains an unquantified element of refinancing of existing mortgages (e.g. involving the redemption of an existing mortgage and its replacement with a mortgage from a different lender).
This data is not directly comparable with post 2007 data from IBF
The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change.
Source: From lending institutions and local authorities The Loan payments dataset stops in 2007. This data contains an unquantified element of refinancing of existing mortgages (e.g. involving the redemption of an existing mortgage and its replacement with a mortgage from a different lender). The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Source: From lending institutions This data contains an unquantified element of refinancing of existing mortgages (e.g. involving the redemption of an existing mortgage and its replacement with a mortgage from a different lender). This data is not directly comparable with post 2007 data from IBF The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Concept: 90 days past due loans by source of funds and type of credit - microenterprise - earmarked credit - Housing financing - mortgage Source: Credit Information System 26309-90-days-past-due-loans-by-source-of-funds-and-type-of-credit---microenterprise---earmarked-cr 26309-90-days-past-due-loans-by-source-of-funds-and-type-of-credit---microenterprise---earmarked-cr
The FHFA stress test is updated each quarter according to objective rules derived from fundamental economic relationships. These rules affect a dynamic adjustment to the severity of the stress test that accounts for current economic conditions, specifically the current level of house prices relative to the ongoing house price cycle. The stress test incorporates different house-price level (HPI) stress paths for each state, thus accounting for the fact that house price cycles can differ significantly from one state or region to another. The severity of the economic stress imposed by the test, as measured by the projected percentage drop in HPI, changes over time for each state corresponding to the deviation of current HPI from its long-run trend. As a result of this design, the FHFA stress test will produce countercyclical economic capital requirements, in that the estimates of potential losses on new mortgage loan originations increase during economic expansions, as current HPI rises above its long-term trend, and decrease during economic contractions, as current HPI falls to or below trend. The dynamic adjustment feature of the stress test allows that it will accommodate any size current house price cycle, even those of greater amplitude than any observed previously. Further, the severity of the stress test is calibrated to produce economic capital requirements that are sufficient, as of the day of origination, to fully capitalize the mortgage assets for the life of those assets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data contains an unquantified element of refinancing of existing mortgages (e.g. involving the redemption of an existing mortgage and its replacement with a mortgage from a different lender). The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change.
Our Home Ownership Mortgage Database is rebuilt from every two months and contains information on over 50+ million US Homeowners. The data is collected from county recorder and assessor offices.
The file is processed via National Change of Address (NCOA) to ensure deliverability. Additionally, the data is passed against suppression files to eliminate consumers or telephone numbers as appropriate such as Decease File, State Attorney General (SAG) data, the Direct Marketing Association's (DMA) do-not-mail and do-not-call lists, and the national FTC do-not-call file.
Selections include mortgage loan and property attributes along with household, individual and neighborhood demographics.
Lending Club offers peer-to-peer (P2P) loans through a technological platform for various personal finance purposes and is today one of the companies that dominate the US P2P lending market. The original dataset is publicly available on Kaggle and corresponds to all the loans issued by Lending Club between 2007 and 2018. The present version of the dataset is for constructing a granting model, that is, a model designed to make decisions on whether to grant a loan based on information available at the time of the loan application. Consequently, our dataset only has a selection of variables from the original one, which are the variables known at the moment the loan request is made. Furthermore, the target variable of a granting model represents the final status of the loan, that are "default" or "fully paid". Thus, we filtered out from the original dataset all the loans in transitory states. Our dataset comprises 1,347,681 records or obligations (approximately 60% of the original) and it was also cleaned for completeness and consistency (less than 1% of our dataset was filtered out).
TARGET VARIABLE
The dataset includes a target variable based on the final resolution of the credit: the default category corresponds to the event charged off and the non-default category to the event fully paid. It does not consider other values in the loan status variable since this variable represents the state of the loan at the end of the considered time window. Thus, there are no loans in transitory states. The original dataset includes the target variable “loan status”, which contains several categories ('Fully Paid', 'Current', 'Charged Off', 'In Grace Period', 'Late (31-120 days)', 'Late (16-30 days)', 'Default'). However, in our dataset, we just consider loans that are either “Fully Paid” or “Default” and transform this variable into a binary variable called “Default”, with a 0 for fully paid loans and a 1 for defaulted loans.
EXPLANATORY VARIABLES
The explanatory variables that we use correspond only to the information available at the time of the application. Variables such as the interest rate, grade, or subgrade are generated by the company as a result of a credit risk assessment process, so they were filtered out from the dataset as they must not be considered in risk models to predict the default in granting of credit.
FULL LIST OF VARIABLES
Loan identification variables:
id: Loan id (unique identifier).
issue_d: Month and year in which the loan was approved.
Quantitative variables:
revenue: Borrower's self-declared annual income during registration.
dti_n: Indebtedness ratio for obligations excluding mortgage. Monthly information. This ratio has been calculated considering the indebtedness of the whole group of applicants. It is estimated as the ratio calculated using the co-borrowers’ total payments on the total debt obligations divided by the co-borrowers’ combined monthly income.
loan_amnt: Amount of credit requested by the borrower.
fico_n: Defined between 300 and 850, reported by Fair Isaac Corporation as a risk measure based on historical credit information reported at the time of application. This value has been calculated as the average of the variables “fico_range_low” and “fico_range_high” in the original dataset.
experience_c: Binary variable that indicates whether the borrower is new to the entity. This variable is constructed from the credit date of the previous obligation in LC and the credit date of the current obligation; if the difference between dates is positive, it is not considered as a new experience with LC.
Categorical variables:
emp_length: Categorical variable with the employment length of the borrower (includes the no information category)
purpose: Credit purpose category for the loan request.
home_ownership_n: Homeownership status provided by the borrower in the registration process. Categories defined by LC: “mortgage”, “rent”, “own”, “other”, “any”, “none”. We merged the categories “other”, “any” and “none” as “other”.
addr_state: Borrower's residence state from the USA.
zip_code: Zip code of the borrower's residence.
Textual variables
title: Title of the credit request description provided by the borrower.
desc: Description of the credit request provided by the borrower.
We cleaned the textual variables. First, we removed all those descriptions that contained the default description provided by Lending Club on its web form (“Tell your story. What is your loan for?”). Moreover, we removed the prefix “Borrower added on DD/MM/YYYY >” from the descriptions to avoid any temporal background on them. Finally, as these descriptions came from a web form, we substituted all the HTML elements by their character (e.g. “&” was substituted by “&”, “<” was substituted by “<”, etc.).
RELATED WORKS
This dataset has been used in the following academic articles:
Sanz-Guerrero, M. Arroyo, J. (2024). Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending. arXiv preprint arXiv:2401.16458. https://doi.org/10.48550/arXiv.2401.16458
Ariza-Garzón, M.J., Arroyo, J., Caparrini, A., Segovia-Vargas, M.J. (2020). Explainability of a machine learning granting scoring model in peer-to-peer lending. IEEE Access 8, 64873 - 64890. https://doi.org/10.1109/ACCESS.2020.2984412
Source: From lending institutions and local authorities
The loan payments dataset stops in 2007.
The figures on fixed interest rate mortgages relate to mortgages which provide that the rate of interest may not be changed, or may only be changed at intervals of not less than one year.
The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a classification problem. The dataset contains 13 columns where, the loan_status column is the one we have to predict.
Variable | Description |
---|---|
Loan_ID | Unique Loan ID |
Gender | Male/ Female |
Married | Applicant married (Y/N) |
Dependents | Number of dependents |
Education | Applicant Education (Graduate/ Under Graduate) |
Self_Employed | Self employed (Y/N) |
ApplicantIncome | Applicant income |
CoapplicantIncome | Coapplicant income |
LoanAmount | Loan amount in thousands |
Loan_Amount_Term | Term of loan in months |
Credit_History | credit history meets guidelines |
Property_Area | Urban/ Semi Urban/ Rural |
Loan_Status | (Target) Loan approved (Y/N) |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Overall Loan Approvals by year’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/https-data-usmart-io-org-ae1d5c14-c392-4c3f-9705-537427eeb413-dataset-viewdiscovery-datasetguid-25ad382a-6039-4053-8544-b7d074cde78e on 11 January 2022.
--- Dataset description provided by original source is as follows ---
Source: From lending institutions and local authorities
This data contains an unquantified element of refinancing of existing mortgages (e.g. involving the redemption of an existing mortgage and its replacement with a mortgage from a different lender).
The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average Mortgage Size in the United States increased to 379.21 Thousand USD in May 31 from 376.99 Thousand USD in the previous week. This dataset includes a chart with historical data for the United States Average Mortgage Size.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data contains an unquantified element of refinancing of existing mortgages (e.g. involving the redemption of an existing mortgage and its replacement with a mortgage from a different lender).
The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change.