Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
30 Year Mortgage Rate in the United States decreased to 6.85 percent in June 5 from 6.89 percent in the previous week. This dataset includes a chart with historical data for the United States 30 Year Mortgage Rate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fixed 30-year mortgage rates in the United States averaged 6.92 percent in the week ending May 30 of 2025. This dataset provides the latest reported value for - United States MBA 30-Yr Mortgage Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
By Natarajan Krishnaswami [source]
The FHFA Public Use Databases provide an unprecedented look into the flow of mortgage credit and capital in America's communities. With detailed information about the income, race, gender and census tract location of borrowers, this database can help lenders, planners, researchers and housing advocates better understand how mortgages are acquired by Fannie Mae and Freddie Mac.
This data set includes 2009-2016 single-family property loan information from the Enterprises in combination with corresponding census tract information from the 2010 decennial census. It allows for greater granularity in examining mortgage acquisition patterns within each MSA or county by combining borrower/property characteristics, such as borrower's race/ethnicity; co-borrower demographics; occupancy type; Federal guarantee program (conventional/other versus FHA-insured); age of borrowers; loan purpose (purchase, refinance or home improvement); lien status; rate spread between annual percentage rate (APR) and average prime offer rate (APOR); HOEPA status; area median family income and more.
In addition to demographic data on borrowers and properties, this dataset also provides insight into affordability metrics such as median family incomes at both the MSA/county level as well as functional owner occupied bankrupt tracts using 2010 Census based geography while taking into account American Community Survey estimates available at January 1st 2016. This allows us to calculate metrics that are important for assessing inequality such as tract income ratios which measure what portion of an area’s median family income is made up by a single borrows earnings or the ratio between borrows annual income compared to an area’s average median family iincome for those year’s reporting period. Finally each record contains Enterprise Flags associated with whether loans were purchased my Fannie Mae or Freddie Mac indicating further insights regarding who is financing policies affecting undocumented immigrant labor access as well affordable housing legislation targeted towards first time home buyers
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This guide will provide you with all the information needed to use the Fannie Mae and Freddie Mac Loan-Level Dataset for 2016. The dataset contains loan-level data for both Fannie Mae and Freddie Mac, including loans acquired in 2016. It includes details such as homeowner demographics, loan-to-value ratio, census tract location, and affordability of mortgage.
The first step to using this dataset is understanding how it is organized. There are 38 fields that make up the loan level data set, making it easy to understand what is being looked at. For each field there is a description of what the field represents and potential values it can take on (i.e., if it’s an integer or float). Having an understanding of the different fields will help when querying certain data points or comparing/contrasting.
Once you understand what type of information is available in this dataset you can start to create queries or visualizations that compare trends across Fannie Mae & Freddie Mac loans made in 2016. Depending on your interest areas such as homeownership rates or income disparities certain statistics may be pulled from the dataset such as borrower’s Annual Income Ratio per area median family income by state code or a comparison between Race & Ethnicity breakdown between borrowers and co-borrowers from various states respective MSAs, among other possibilities based on your inquiries . Visualizations should then be created so that clear comparisons and contrasts could be seen more easily by other users who may look into this same dataset for additional insights as well .
After creating queries/visualization , you can dive deeper into research about corresponding trends & any biases seen within these datasets related within particular racial groupings compared against US Postal & MSA codes used within the 2010 Census Tract locations throughout the US respectively by further utilizing publicly available research material that looks at these subjects with regards housing policies implemented through out years one could further draw conclusions depending on their current inquiries
- Use the dataset to analyze borrowing patterns based on race, nationality and gender, to better understand the links between minority groups and access to credit...
DESCRIPTION
Create a model that predicts whether or not a loan will be default using the historical data.
Problem Statement:
For companies like Lending Club correctly predicting whether or not a loan will be a default is very important. In this project, using the historical data from 2007 to 2015, you have to build a deep learning model to predict the chance of default for future loans. As you will see later this dataset is highly imbalanced and includes a lot of features that make this problem more challenging.
Domain: Finance
Analysis to be done: Perform data preprocessing and build a deep learning prediction model.
Content:
Dataset columns and definition:
credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.
purpose: The purpose of the loan (takes values "credit_card", "debt_consolidation", "educational", "major_purchase", "small_business", and "all_other").
int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.
installment: The monthly installments owed by the borrower if the loan is funded.
log.annual.inc: The natural log of the self-reported annual income of the borrower.
dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income).
fico: The FICO credit score of the borrower.
days.with.cr.line: The number of days the borrower has had a credit line.
revol.bal: The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle).
revol.util: The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available).
inq.last.6mths: The borrower's number of inquiries by creditors in the last 6 months.
delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years.
pub.rec: The borrower's number of derogatory public records (bankruptcy filings, tax liens, or judgments).
Steps to perform:
Perform exploratory data analysis and feature engineering and then apply feature engineering. Follow up with a deep learning model to predict whether or not the loan will be default using the historical data.
Tasks:
Transform categorical values into numerical values (discrete)
Exploratory data analysis of different factors of the dataset.
Additional Feature Engineering
You will check the correlation between features and will drop those features which have a strong correlation
This will help reduce the number of features and will leave you with the most relevant features
After applying EDA and feature engineering, you are now ready to build the predictive models
In this part, you will create a deep learning model using Keras with Tensorflow backend
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in the United States was last recorded at 4.50 percent. This dataset provides the latest reported value for - United States Fed Funds Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: Lending Interest Rate data was reported at 3.512 % pa in 2016. This records an increase from the previous number of 3.260 % pa for 2015. United States US: Lending Interest Rate data is updated yearly, averaging 6.922 % pa from Dec 1960 (Median) to 2016, with 57 observations. The data reached an all-time high of 18.870 % pa in 1981 and a record low of 3.250 % pa in 2014. United States US: Lending Interest Rate data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s United States – Table US.World Bank.WDI: Interest Rates. Lending rate is the bank rate that usually meets the short- and medium-term financing needs of the private sector. This rate is normally differentiated according to creditworthiness of borrowers and objectives of financing. The terms and conditions attached to these rates differ by country, however, limiting their comparability.; ; International Monetary Fund, International Financial Statistics and data files.; ;
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Iran IR: Lending Interest Rate data was reported at 18.000 % pa in 2016. This records an increase from the previous number of 14.210 % pa for 2015. Iran IR: Lending Interest Rate data is updated yearly, averaging 12.000 % pa from Dec 2004 (Median) to 2016, with 13 observations. The data reached an all-time high of 18.000 % pa in 2016 and a record low of 11.000 % pa in 2013. Iran IR: Lending Interest Rate data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Iran – Table IR.World Bank.WDI: Interest Rates. Lending rate is the bank rate that usually meets the short- and medium-term financing needs of the private sector. This rate is normally differentiated according to creditworthiness of borrowers and objectives of financing. The terms and conditions attached to these rates differ by country, however, limiting their comparability.; ; International Monetary Fund, International Financial Statistics and data files.; ;
Lending Club offers peer-to-peer (P2P) loans through a technological platform for various personal finance purposes and is today one of the companies that dominate the US P2P lending market. The original dataset is publicly available on Kaggle and corresponds to all the loans issued by Lending Club between 2007 and 2018. The present version of the dataset is for constructing a granting model, that is, a model designed to make decisions on whether to grant a loan based on information available at the time of the loan application. Consequently, our dataset only has a selection of variables from the original one, which are the variables known at the moment the loan request is made. Furthermore, the target variable of a granting model represents the final status of the loan, that are "default" or "fully paid". Thus, we filtered out from the original dataset all the loans in transitory states. Our dataset comprises 1,347,681 records or obligations (approximately 60% of the original) and it was also cleaned for completeness and consistency (less than 1% of our dataset was filtered out).
TARGET VARIABLE
The dataset includes a target variable based on the final resolution of the credit: the default category corresponds to the event charged off and the non-default category to the event fully paid. It does not consider other values in the loan status variable since this variable represents the state of the loan at the end of the considered time window. Thus, there are no loans in transitory states. The original dataset includes the target variable “loan status”, which contains several categories ('Fully Paid', 'Current', 'Charged Off', 'In Grace Period', 'Late (31-120 days)', 'Late (16-30 days)', 'Default'). However, in our dataset, we just consider loans that are either “Fully Paid” or “Default” and transform this variable into a binary variable called “Default”, with a 0 for fully paid loans and a 1 for defaulted loans.
EXPLANATORY VARIABLES
The explanatory variables that we use correspond only to the information available at the time of the application. Variables such as the interest rate, grade, or subgrade are generated by the company as a result of a credit risk assessment process, so they were filtered out from the dataset as they must not be considered in risk models to predict the default in granting of credit.
FULL LIST OF VARIABLES
Loan identification variables:
id: Loan id (unique identifier).
issue_d: Month and year in which the loan was approved.
Quantitative variables:
revenue: Borrower's self-declared annual income during registration.
dti_n: Indebtedness ratio for obligations excluding mortgage. Monthly information. This ratio has been calculated considering the indebtedness of the whole group of applicants. It is estimated as the ratio calculated using the co-borrowers’ total payments on the total debt obligations divided by the co-borrowers’ combined monthly income.
loan_amnt: Amount of credit requested by the borrower.
fico_n: Defined between 300 and 850, reported by Fair Isaac Corporation as a risk measure based on historical credit information reported at the time of application. This value has been calculated as the average of the variables “fico_range_low” and “fico_range_high” in the original dataset.
experience_c: Binary variable that indicates whether the borrower is new to the entity. This variable is constructed from the credit date of the previous obligation in LC and the credit date of the current obligation; if the difference between dates is positive, it is not considered as a new experience with LC.
Categorical variables:
emp_length: Categorical variable with the employment length of the borrower (includes the no information category)
purpose: Credit purpose category for the loan request.
home_ownership_n: Homeownership status provided by the borrower in the registration process. Categories defined by LC: “mortgage”, “rent”, “own”, “other”, “any”, “none”. We merged the categories “other”, “any” and “none” as “other”.
addr_state: Borrower's residence state from the USA.
zip_code: Zip code of the borrower's residence.
Textual variables
title: Title of the credit request description provided by the borrower.
desc: Description of the credit request provided by the borrower.
We cleaned the textual variables. First, we removed all those descriptions that contained the default description provided by Lending Club on its web form (“Tell your story. What is your loan for?”). Moreover, we removed the prefix “Borrower added on DD/MM/YYYY >” from the descriptions to avoid any temporal background on them. Finally, as these descriptions came from a web form, we substituted all the HTML elements by their character (e.g. “&” was substituted by “&”, “<” was substituted by “<”, etc.).
RELATED WORKS
This dataset has been used in the following academic articles:
Sanz-Guerrero, M. Arroyo, J. (2024). Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending. arXiv preprint arXiv:2401.16458. https://doi.org/10.48550/arXiv.2401.16458
Ariza-Garzón, M.J., Arroyo, J., Caparrini, A., Segovia-Vargas, M.J. (2020). Explainability of a machine learning granting scoring model in peer-to-peer lending. IEEE Access 8, 64873 - 64890. https://doi.org/10.1109/ACCESS.2020.2984412
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains the customer's data from a loan company known as Prosper. This dataset comprises of 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others.
Definition of Variables:
ListingKey: Unique key for each listing, same value as the 'key' used in the listing object in the API. ListingNumber: The number that uniquely identifies the listing to the public as displayed on the website. ListingCreationDate: The date the listing was created. CreditGrade: The Credit rating that was assigned at the time the listing went live. Applicable for listings pre-2009 period and will only be populated for those listings. Term: The length of the loan expressed in months. LoanStatus: The current status of the loan: Cancelled, Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, PastDue. The PastDue status will be accompanied by a delinquency bucket. ClosedDate: Closed date is applicable for Cancelled, Completed, Chargedoff and Defaulted loan statuses. BorrowerAPR: The Borrower's Annual Percentage Rate (APR) for the loan. BorrowerRate: The Borrower's interest rate for this loan. LenderYield: The Lender yield on the loan. Lender yield is equal to the interest rate on the loan less the servicing fee. EstimatedEffectiveYield: Effective yield is equal to the borrower interest rate (i) minus the servicing fee rate, (ii) minus estimated uncollected interest on charge-offs, (iii) plus estimated collected late fees. Applicable for loans originated after July 2009. EstimatedLoss: Estimated loss is the estimated principal loss on charge-offs. Applicable for loans originated after July 2009. EstimatedReturn: The estimated return assigned to the listing at the time it was created. Estimated return is the difference between the Estimated Effective Yield and the Estimated Loss Rate. Applicable for loans originated after July 2009. ProsperRating (numeric): The Prosper Rating assigned at the time the listing was created: 0 - N/A, 1 - HR, 2 - E, 3 - D, 4 - C, 5 - B, 6 - A, 7 - AA. Applicable for loans originated after July 2009. ProsperRating (Alpha): The Prosper Rating assigned at the time the listing was created between AA - HR. Applicable for loans originated after July 2009. ProsperScore: A custom risk score built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score. Applicable for loans originated after July 2009. ListingCategory: The category of the listing that the borrower selected when posting their listing: 0 - Not Available, 1 - Debt Consolidation, 2 - Home Improvement, 3 - Business, 4 - Personal Loan, 5 - Student Use, 6 - Auto, 7- Other, 8 - Baby&Adoption, 9 - Boat, 10 - Cosmetic Procedure, 11 - Engagement Ring, 12 - Green Loans, 13 - Household Expenses, 14 - Large Purchases, 15 - Medical/Dental, 16 - Motorcycle, 17 - RV, 18 - Taxes, 19 - Vacation, 20 - Wedding Loans BorrowerState: The two letter abbreviation of the state of the address of the borrower at the time the Listing was created. Occupation: The Occupation selected by the Borrower at the time they created the listing. EmploymentStatus: The employment status of the borrower at the time they posted the listing. EmploymentStatusDuration: The length in months of the employment status at the time the listing was created. IsBorrowerHomeowner: A Borrower will be classified as a homowner if they have a mortgage on their credit profile or provide documentation confirming they are a homeowner. CurrentlyInGroup: Specifies whether or not the Borrower was in a group at the time the listing was created. GroupKey: The Key of the group in which the Borrower is a member of. Value will be null if the borrower does not have a group affiliation. DateCreditPulled: The date the credit profile was pulled. CreditScoreRangeLower: The lower value representing the range of the borrower's credit score as provided by a consumer credit rating agency. CreditScoreRangeUpper: The upper value representing the range of the borrower's credit score as provided by a consumer credit rating agency. FirstRecordedCreditLine: The date the first credit line was opened. CurrentCreditLines: Number of current credit lines at the time the credit profile was pulled. OpenCreditLines: Number of open credit lines at the time the credit profile was pulled. TotalCreditLinespast7years: Number of credit lines in the past seven years at the time the credit profile was pulled. OpenRevolvingAccounts: Number of open revolving accounts at the time the credit profile was pulled. OpenRevolvingMonthlyPayment: Monthly payment on revolving accounts at the time the credit profile was pulled. InquiriesLast6Months: Number of inquiries in the past six months at the time the cre...
Data Description
1 id : To uniquely identify every loan in the dataset.
2 member_id : To identify the borrower to who has applied for the loan. 3 loan_amnt : The listed amount of the loan applied for by the borrower. 4 funded_amnt : The amount that was sanctioned by the LC. 5 term : The number of payments on the loan. Values are in months and can be either 36 or 60. 6 int_rate : Interest Rate on the loan 7 installment : The monthly payment owed by the borrower if the loan originates. 8 grade : LC assigned loan grade which depends on the borrower’s credit score. 9 sub_grade : LC assigned loan subgrade 10 emp_title : The job title supplied by the Borrower when applying for the loan.* 11 emp_length : Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years. 12 home_ownership : The home ownership status provided by the borrower during registration or obtained from the credit report. Our values are: RENT, OWN, MORTGAGE, OTHER 13 annual_inc : The self-reported annual income provided by the borrower during registration. 14 verification_status : Indicates if income was verified by LC, not verified, or if the income source was verified 15 issue_d : The month which the loan was funded 16 loan_status : Current status of the loan 17 purpose : A category provided in the form of a code to indicate the purpose for the loan. 18 title : Explaining the ‘purpose’ of the loan. 19 dti : The debt to income ratio is the ratio of how much the borrower owes every month to the borrower’s income every month. 20 delinq_2yrs : The number of delinquencies(late installment payment) by the borrower in the past 2 years. 21 earliest_cr_line : The month-year the borrower's earliest reported credit line was opened 22 inq_last_6mths : Inquiries for loans made by the borrower over the past 6 months. 23 mths_since_last_delinq : Months that have passed since the borrower last missed the timely payment of installment. 24 open_acc : The number of open credit lines in the borrower’s credit file. 25 pub_rec Number of derogatory public records 26 revol_bal : Total credit revolving balance 27 revol_util : Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit. 28 total_acc : The total number of credit lines currently in the borrower's credit file 29 initial_list_status : The initial listing status of the loan. Possible values are – W(whole), F(fractional) 30 out_prncp : Remaining outstanding principal for total amount funded 31 total_pymnt : Payments received to date for the total amount funded. 32 total_rec_prncp : Principal received till date. 33 total_rec_int Interest received till date. 34 total_rec_late_fee : Late fees received to date. 35 recoveries : Total recovery procedures initiated against the borrower. 36 collection_recovery_fee : The fees collected during the recovery procedures. 37 last_pymnt_d The last month when payment was received. 38 last_pymnt_amnt : The last payment amount received. 39 next_pymnt_d : Next scheduled payment date. 40 last_credit_pull_d : The most recent month LC pulled credit for this loan 41 collections_12_mths_ex_med : Number of collections in 12 months excluding medical collections 42 mths_since_last_major_derog : Months since most recent 90-day delinquency or worse rating 43 application_type Indicates whether the loan is an individual application or a joint application with two co-borrowers 44 annual_inc_joint : The combined self-reported annual income provided by the co-borrowers during registration 45 dti_joint : A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self-reported monthly income 46 acc_now_delinq : The number of accounts on which the borrower is now delinquent 47 tot_coll_amt : Total collection amounts ever owed by the borrower 48 tot_cur_bal : Total current balance of all accounts owned by the borrower 49 total_rev_hi_lim : Total high credit/credit limit
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides information about people applying for loans, including details on their personal background, finances, and loan specifics. It's meant to help us better understand how different personal factors impact whether a loan gets approved. The data includes things like the applicant's age, income, home ownership status, job history, and credit score, along with loan details such as the loan amount, interest rate, and purpose. It also shows whether the loan was approved or denied.
Features in the dataset:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in Mexico was last recorded at 8.50 percent. This dataset provides - Mexico Interest Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
By Zillow Data [source]
This dataset, Negative Equity in the US Housing Market, provides an in-depth look into the negative equity occurring across the United States during this single quarter. Included are metrics such as total amount of negative equity in millions of dollars, total number of homes in negative equity, percentage of homes with mortgages that are in negative equity and more. These data points provide helpful insights into both regional and national trends regarding the prevalence and rate of home mortgage delinquency stemming from a diminishment of value from peak levels.
Home types available for analysis include 'all homes', condos/co-ops, multifamily units containing five or more housing units as well as duplexes/triplexes. Additionally, Cash buyers rates for particular areas can also be determined by referencing this collection. Further metrics such as mortgage affordability rates and impacts on overall indebtedness are readily calculated using information related to Zillow's Home Value Index (ZHVI) forecast methodology and TransUnion data respectively.
Other variables featured within this dataset include characteristics like region type (i.e city, county ..etc), size rank based on population values , percentage change in ZHVI since peak levels as well as loan-to-value ratio greater than 200 across all regions constituted herein (NE). Moreover Zillow's own Secondary Mortgage Market Survey data is utilized to acquire average mortgage quote rates while correlative Census Bureau NCHS median household income figures represent typical assessable proportions between wages and debt obligations . So whether you're looking to assess effects along metro lines or detailed buffering through zip codes , this database should prove sufficient for insightful explorations! Nonetheless users must strictly adhere to all conditions encompassed within Terms Of Use commitments put forth by our lead provider before accessing any resources included herewith
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
- Analyzing regional and state trends in negative equity: Analyze geographic differences in the percentage of mortgages “underwater”, total amount of negative equity, number of homes at least 90 days late, and other key indicators to provide insight into the factors influencing negative equity across regions, states and cities.
- Tracking the recovery rate over time: Track short-term changes in numbers related to negative equity (e.g., region or area ZHVI Change from Peak) to monitor recovery rates over time as well as how different policy interventions are affecting homeownership levels in affected areas.
- Exploring best practices for promoting housing affordability: Compare affordability metrics (e.g., mortgage payments, price-to-income ratios) across different geographic locations over time to identify best practices for empowering homeowners and promoting stability within the housing market while reducing local inequality impacts related to availability of affordable housing options and access to credit markets like mortgages/loans etc
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: NESummary_2017Q1_Public.csv | Column name | Description | |:------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------| | RegionType | The type of region (e.g., city, county, metro etc.) (String) | | City | Name of the city (String) | | County | Name of the county (String) | | State | Name of the state (String) | | Metro ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for 30 YEAR MORTGAGE RATE reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
This hosted feature layer has been published in RI State Plane Feet NAD 83.The RI Neighborhood Stabilization Program (NSP) Mapping analysis was performed to assist the Office of Housing and Community Development in identifying target areas with both a Foreclosure Rate (Block Group Level) >=6.5% and a Subprime Loan percentage rate >= 1.4% (Zip Code Level). Based on these criteria the following communities were identified as containing such target areas: Central Falls, Cranston, Cumberland, East Providence, Johnston, North Providence, Pawtucket, Providence, Warwick, West Warwick, and Woonsocket. Federal funding, under the Housing and Economic Recovery Act of 2008 (HERA), Neighborhood Stabilization Program (NSP), totaling $19.6 will be expended in these NSP Target Areas to assist in the rehabilitation and redevelopment of abandoned and foreclosed homes, stabilizing communities.The State of Rhode Island distributes funds allocated, giving priority emphasis and consideration to those areas with the greatest need, including those areas with - 1) Highest percentage of home foreclosures; 2) Highest percentage of homes financed by subprime mortgage loans; and 3) Anticipated increases in rate of foreclosure. The RI Office of Housing and Community Development, with the assistance of Rhode Island Housing, utilized the following sources to meet the above requirements. 1) U.S. Department of Housing & Urban Development (HUD) developed foreclosure data to assist grantees in identification of Target Areas. The State utilized HUD's predictive foreclosure rates to identify those areas which are likely to face a significant rise in the rate of home foreclosures. HUD's methodology factored in Home Mortgage Disclosure Act, income, unemployment, and other information in its calculation. The results were analyzed and revealed a high level of consistency with other needs data available. 2) The State obtained subprime mortgage loan information from the Federal Reserve Bank of Boston. Though the data does not include all mortgages, and was only available at the zip code level rather than Census Tract, findings were generally consistent with other need categories. This data was joined to the Foreclosure dataset in order to select areas with both a Foreclosure Rate >=6.5% and a Subprime Loan Rate >=1.4%. 3) The State also obtained, from the Warren Group, actual local foreclosure transaction records. The Warren Group is a source for real estate and banking news and transaction data throughout New England. This entity has analyzed local deed records in assembling information presented. The data set was normalized due to potential limitations. An analysis revealed a high level of consistency with HUD-predictive foreclosure rates.
The dataset has the following columns: Age: Age of the client (numeric) Job: Type of job (categorical) Marital: Marital status (categorical) Education: Education level (categorical) Default: Has credit in default? (categorical) Housing: Has housing loan? (categorical) Loan: Has personal loan? (categorical) Contact: Contact communication type (categorical) Day: Last contact day of the month (numeric) Month: Last contact month of the year (categorical) Duration: Last contact duration in seconds (numeric) [Note: Highly influential on the output] Campaign: Number of contacts during this campaign for this client (numeric) pdays: Number of days since the client was last contacted from a previous campaign (numeric) Previous: Number of contacts performed before this campaign and for this client (numeric) Poutcome: Outcome of the previous marketing campaign (categorical) emp.var.rate: Employment variation rate - quarterly indicator (numeric) cons.price.idx: Consumer price index - monthly indicator (numeric) cons.conf.idx: Consumer confidence index - monthly indicator (numeric) euribor3m: Euribor 3-month rate - daily indicator (numeric) nr.employed: Number of employees - quarterly indicator (numeric)
The target variable is binary(yes/no) that helps in predicting whether the client is capable of subscribing for the services provided by the bank
DESCRIPTION
A banking institution requires actionable insights into mortgage-backed securities, geographic business investment, and real estate analysis. The mortgage bank would like to identify potential monthly mortgage expenses for each region based on monthly family income and rental of the real estate. A statistical model needs to be created to predict the potential demand in dollars amount of loan for each of the region in the USA. Also, there is a need to create a dashboard which would refresh periodically post data retrieval from the agencies. The dashboard must demonstrate relationships and trends for the key metrics as follows: number of loans, average rental income, monthly mortgage and owner’s cost, family income vs mortgage cost comparison across different regions. The metrics described here do not limit the dashboard to these few. Dataset Description
Variables
Description Second mortgage Households with a second mortgage statistics Home equity Households with a home equity loan statistics Debt Households with any type of debt statistics Mortgage Costs Statistics regarding mortgage payments, home equity loans, utilities, and property taxes Home Owner Costs Sum of utilities, and property taxes statistics Gross Rent Contract rent plus the estimated average monthly cost of utility features High school Graduation High school graduation statistics Population Demographics Population demographics statistics Age Demographics Age demographic statistics Household Income Total income of people residing in the household Family Income Total income of people related to the householder Project Task: Week 1
Data Import and Preparation:
Import data.
Figure out the primary key and look for the requirement of indexing.
Gauge the fill rate of the variables and devise plans for missing value treatment. Please explain explicitly the reason for the treatment chosen for each variable.
Exploratory Data Analysis (EDA):
Perform debt analysis. You may take the following steps:
Explore the top 2,500 locations where the percentage of households with a second mortgage is the highest and percent ownership is above 10 percent. Visualize using geo-map. You may keep the upper limit for the percent of households with a second mortgage to 50 percent
Use the following bad debt equation:
Bad Debt = P (Second Mortgage ∩ Home Equity Loan) Bad Debt = second_mortgage + home_equity - home_equity_second_mortgage Create pie charts to show overall debt and bad debt
Create Box and whisker plot and analyze the distribution for 2nd mortgage, home equity, good debt, and bad debt for different cities
Create a collated income distribution chart for family income, house hold income, and remaining income
Perform EDA and come out with insights into population density and age. You may have to derive new fields (make sure to weight averages for accurate measurements):
Use pop and ALand variables to create a new field called population density
Use male_age_median, female_age_median, male_pop, and female_pop to create a new field called median age
Visualize the findings using appropriate chart type
Create bins for population into a new variable by selecting appropriate class interval so that the number of categories don’t exceed 5 for the ease of analysis.
Analyze the married, separated, and divorced population for these population brackets
Visualize using appropriate chart type
Please detail your observations for rent as a percentage of income at an overall level, and for different states.
Perform correlation analysis for all the relevant variables by creating a heatmap. Describe your findings.
Project Task: Week 2
Data Pre-processing:
The economic multivariate data has a significant number of measured variables. The goal is to find where the measured variables depend on a number of smaller unobserved common factors or latent variables.
Each variable is assumed to be dependent upon a linear combination of the common factors, and the coefficients are known as loadings. Each measured variable also includes a component due to independent random variability, known as “specific variance” because it is specific to one variable. Obtain the common factors and then plot the loadings. Use factor analysis to find latent variables in our dataset and gain insight into the linear relationships in the data.
Following are the list of latent variables:
Highschool graduation rates
Median population age
Second mortgage statistics
Percent own
Bad debt expense
Data Modeling :
Build a linear Regression model to predict the total monthly expenditure for home mortgages loan.
Please refer deplotment_RE.xlsx. Column hc_mortgage_mean is predicted variable. This is the mean monthly mortgage and owner costs of specified geographical location.
Note: Exclude loans from prediction model which have NaN (Not a Numb...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in China was last recorded at 3 percent. This dataset provides the latest reported value for - China Interest Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in Norway was last recorded at 4.50 percent. This dataset provides the latest reported value for - Norway Interest Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mortgage Application in the United States decreased by 3.90 percent in the week ending May 30 of 2025 over the previous week. This dataset provides - United States MBA Mortgage Applications - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
30 Year Mortgage Rate in the United States decreased to 6.85 percent in June 5 from 6.89 percent in the previous week. This dataset includes a chart with historical data for the United States 30 Year Mortgage Rate.