Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">
A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?
Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
House Price Index YoY in the United States decreased to 1.70 percent in September from 2.40 percent in August of 2025. This dataset includes a chart with historical data for the United States FHFA House Price Index YoY.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
30 Year Mortgage Rate in the United States decreased to 6.23 percent in November 26 from 6.26 percent in the previous week. This dataset includes a chart with historical data for the United States 30 Year Mortgage Rate.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The interest rate set by the Federal Reserve is a crucial tool for promoting economic conditions that meet the mandate established by the United States Congress, which includes high employment, low and stable inflation, sustainable economic growth, and the moderation of long-term interest rates. The interest rates determined by the Fed directly influence the cost of credit, making financing either more accessible or more restrictive. When interest rates are low, there is a greater incentive for consumers to purchase homes through mortgages, finance automobiles, or undertake home renovations. Additionally, businesses are encouraged to invest in expanding their operations, whether by purchasing new equipment, modernizing facilities, or hiring more workers. Conversely, higher interest rates tend to curb such activity, discouraging borrowing and slowing economic expansion.
The dataset analyzed contains information on the economic conditions in the United States on a monthly basis since 1954, including the federal funds rate, which represents the percentage at which financial institutions trade reserves held at the Federal Reserve with each other in the interbank market overnight. This rate is determined by the market but is directly influenced by the Federal Reserve through open market operations to reach the established target. The Federal Open Market Committee (FOMC) meets eight times a year to determine the federal funds rate target, which has been defined within a range with upper and lower limits since December 2008.
Furthermore, real Gross Domestic Product (GDP) is calculated based on the seasonally adjusted quarterly rate of change in the economy, using chained 2009 dollars as a reference. The unemployment rate represents the seasonally adjusted percentage of the labor force that is unemployed. Meanwhile, the inflation rate is determined by the monthly change in the Consumer Price Index, excluding food and energy prices for a more stable analysis of core inflation.
The interest rate data was sourced from the Federal Reserve Bank of St. Louis' economic data portal, while GDP information was provided by the U.S. Bureau of Economic Analysis, and unemployment and inflation data were made available by the U.S. Bureau of Labor Statistics.
The analysis of this data helps to understand how economic growth, the unemployment rate, and inflation influence the Federal Reserve’s monetary policy decisions. Additionally, it allows for a study of the evolution of interest rate policies over time and raises the question of how predictable the Fed’s future decisions may be. Based on observed trends, it is possible to speculate whether the target range set in March 2017 will be maintained, lowered, or increased, considering the prevailing economic context and the challenges faced in conducting U.S. monetary policy.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset explores the potential relationship between art presence and property prices in London neighborhoods. We conducted an analysis to investigate this by measuring the proportion of Flickr photographs with the keyword ‘art’ attached. We then compared that data to residential property price gains for each Inner London neighborhood, seeking out any associations or correlations between art presence and housing value. Our findings demonstrate the impact of aesthetics on neighborhoods, illustrating how visual environment influences socio-economic conditions. With this dataset, we aim to show how online platforms can be leveraged for quantitative data collection and analysis which can visualize these relationships so as to better understand our urban settings
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset can be used to investigate the relationship between art presence and property prices in London neighborhoods. The dataset includes three columns – Postcode.District, Rank.Mean.Change, and Proportion.Art.Photos – which provide quantitative analyses of the association between art presence and price gains for London neighborhoods.
To use this dataset, first identify the postcode district for which you wish to access data by referencing a street list or PostCodeSearcher website that outlines postcodes for each neighborhood in London(http://postcodesearcher.com/london). This will allow you to easily find properties within each neighborhood as there are specific postcode districts that demarcate boundaries of particular areas (for example W2 covers Bayswater).
Once you have identified a postcode district of interest, review the ‘Rank.Mean Change’ column to explore how residential property prices have changed relative to other areas in Inner London since 2010-13 using fractions (1 = highest gain; 25 = lowest gain). Focusing on one particular location will also provide an idea about their current pricing level compared with others in order to evaluate whether further investment is worthwhile or not based on its past history of growth rates . It is important to note that higher rank numbers indicate higher price gains while lower rank numbers indicate lower price gains relative with respect from 2010-13 timeframe therefore comparing these values across many neighborhoods gives an indication as what area offers more value growth wise over given time period..
Finally pay attention how much did art contributes as far change in property price goes? To answer this question , review ‘Proportion Art Photos’ column which provides ratio of Flickr photographs associated with keyword 'art' attached within given regions helps identify visual characteristics within different localities.. Comparing proportions across various locations provide detail information regarding how much did share visual aesthetic characterstics impacts change in pricings accross different region.. For example it can give us further understandings if majority photographs are made up of urban landscape , abstracts or simply portrait presences had any role play when we look at relativity gains over past few years? Such comparisons help inform our understanding about potential impact art presence can have on changes stay relatively stable even during volatile market times..
By combining this data with other datasets related to demographics, infrastructure and socioeconomics present within londons different areas we can gain further insight which then allows us making informed decisions when it comes investing particular locations .
- Use this dataset to develop a predictive analytics model to identify areas in London most likely to experience an increase in residential property prices associated with the presence of art.
- Use this dataset to develop strategies and policies that promote both artistic expression and urban development in Inner London neighborhoods.
- Compare the presence of art across inner London boroughs, as well as against other cities, to gain insight into the socio-economic conditions related to the visual environment of a city and its impact on life quality for citizens
If you use this dataset in your research, please credit the original authors. Data Source
**License: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativecommons.org/publicd...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Federal Reserve sets interest rates to promote conditions that achieve the mandate set by the Congress — high employment, low and stable inflation, sustainable economic growth, and moderate long-term interest rates. Interest rates set by the Fed directly influence the cost of borrowing money. Lower interest rates encourage more people to obtain a mortgage for a new home or to borrow money for an automobile or for home improvement. Lower rates encourage businesses to borrow funds to invest in expansion such as purchasing new equipment, updating plants, or hiring more workers. Higher interest rates restrain such borrowing by consumers and businesses.
This dataset includes data on the economic conditions in the United States on a monthly basis since 1954. The federal funds rate is the interest rate at which depository institutions trade federal funds (balances held at Federal Reserve Banks) with each other overnight. The rate that the borrowing institution pays to the lending institution is determined between the two banks; the weighted average rate for all of these types of negotiations is called the effective federal funds rate. The effective federal funds rate is determined by the market but is influenced by the Federal Reserve through open market operations to reach the federal funds rate target. The Federal Open Market Committee (FOMC) meets eight times a year to determine the federal funds target rate; the target rate transitioned to a target range with an upper and lower limit in December 2008. The real gross domestic product is calculated as the seasonally adjusted quarterly rate of change in the gross domestic product based on chained 2009 dollars. The unemployment rate represents the number of unemployed as a seasonally adjusted percentage of the labor force. The inflation rate reflects the monthly change in the Consumer Price Index of products excluding food and energy.
The interest rate data was published by the Federal Reserve Bank of St. Louis' economic data portal. The gross domestic product data was provided by the US Bureau of Economic Analysis; the unemployment and consumer price index data was provided by the US Bureau of Labor Statistics.
How does economic growth, unemployment, and inflation impact the Federal Reserve's interest rates decisions? How has the interest rate policy changed over time? Can you predict the Federal Reserve's next decision? Will the target range set in March 2017 be increased, decreased, or remain the same?
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
The property listings dataset contains information about real estate properties available for sale or rent in Brazil. It includes details such as property type (apartment, house, commercial property), location (city, neighborhood), size (square footage, number of rooms), price, amenities, and contact information for the property owner or real estate agent. This dataset can be used for market analysis, property valuation, and identifying trends in the real estate market.
Sales and Rental Prices Dataset: The sales and rental prices dataset provides information about the prices of real estate properties in Brazil. It includes data on property transactions, including sale prices and rental prices per square meter or per month. This dataset can be used to analyze price trends, compare property prices across different regions, and identify areas with high or low real estate market demand.
Property Characteristics Dataset: The property characteristics dataset contains detailed information about the features and attributes of real estate properties. It includes data such as the number of bedrooms, bathrooms, parking spaces, floor plan, construction year, building amenities, and property condition. This dataset can be used for property classification, identifying popular property features, and evaluating property quality.
Geographical Data: Geographical data includes information about the location and spatial features of real estate properties in Brazil. It can include data such as latitude and longitude coordinates, zoning information, proximity to amenities (schools, hospitals, parks), and neighborhood demographics. This dataset can be used for spatial analysis, identifying hotspots or desirable locations, and understanding the neighborhood characteristics.
Property Market Trends Dataset: The property market trends dataset provides information about market conditions and trends in the real estate sector in Brazil. It includes data such as the number of property listings, average time on the market, price fluctuations, mortgage interest rates, and economic indicators that impact the real estate market. This dataset can be used for market forecasting, understanding market dynamics, and making informed investment decisions.
Real Estate Regulatory Data: Real estate regulatory data includes information about legal and regulatory aspects of the real estate sector in Brazil. It can include data on property ownership, property taxes, zoning regulations, building permits, and legal restrictions on property transactions. This dataset can be used for legal compliance, understanding property ownership rights, and assessing the legal framework for real estate transactions.
Historical Data: Historical real estate data includes past records and trends of property prices, market conditions, and sales volumes in Brazil. This dataset can span several years and can be used to analyze long-term market trends, compare current market conditions with historical data, and assess the performance of the real estate market over time.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The index relates to costs ruling on the first day of each month. NATIONAL HOUSE CONSTRUCTION COST INDEX; Up until October 2006 it was known as the National House Building Index Oct 2000 data; The index since October, 2000, includes the first phase of an agreement following a review of rates of pay and grading structures for the Construction Industry and the first phase increase under the PPF. April, May and June 2001; Figures revised in July 2001due to 2% PPF Revised Terms. March 2002; The drop in the March 2002 figure is due to a decrease in the rate of PRSI from 12% to 10¾% with effect from 1 March 2002. The index from April 2002 excludes the one-off lump sum payment equal to 1% of basic pay on 1 April 2002 under the PPF. April, May, June 2003; Figures revised in August'03 due to the backdated increase of 3% from 1April 2003 under the National Partnership Agreement 'Sustaining Progress'. The increases in April and October 2006 index are due to Social Partnership Agreement "Towards 2016". March 2011; The drop in the March 2011 figure is due to a 7.5% decrease in labour costs. Methodology in producing the Index Prior to October 2006: The index relates solely to labour and material costs which should normally not exceed 65% of the total price of a house. It does not include items such as overheads, profit, interest charges, land development etc. The House Building Cost Index monitors labour costs in the construction industry and the cost of building materials. It does not include items such as overheads, profit, interest charges or land development. The labour costs include insurance cover and the building material costs include V.A.T. Coverage: The type of construction covered is a typical 3 bed-roomed, 2 level local authority house and the index is applied on a national basis. Data Collection: The labour costs are based on agreed labour rates, allowances etc. The building material prices are collected at the beginning of each month from the same suppliers for the same representative basket. Calculation: Labour and material costs for the construction of a typical 3 bed-roomed house are weighted together to produce the index. Post October 2006: The name change from the House Building Cost Index to the House Construction Cost Index was introduced in October 2006 when the method of assessing the materials sub-index was changed from pricing a basket of materials (representative of a typical 2 storey 3 bedroomed local authority house) to the CSO Table 3 Wholesale Price Index. The new Index does maintains continuity with the old HBCI. The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change. Oct 2008 data; Decrease due to a fall in the Oct Wholesale Price Index. .hidden { display: none }
Facebook
TwitterThe Chinese economy has undergone a long-term transition reform, but there is still a planned economy characteristic in the financial sector, which is financial repression. Due to the existence of financial repression, China’s actual interest rate level should be lower than the Consumer Price Index (CPI). However, based on official China’s interest rates and CPI, over half of the years China’s actual interest rate remained higher than CPI by our calculation from 1999 to 2022. This is inconsistent with the financial repression that exists in China, and the main reason is the calculation methods of China’s CPI. China’s CPI measurement system originated from the planned economy era, which did not fully consider the rise in housing purchase prices, so the current CPI measurement system can be more realistically presented by taking the rise in housing prices into consider. The core idea of this study is to mining relevant official statistical data and calculate the proportion of Chinese residents’ expenditure on purchasing houses to their total expenditure. By taking the proportion of house purchases as the weight of house price factor, and taking the proportion of other consumption as the weight of official CPI, the Generalized CPI (GCPI) is formulated. The GCPI is then compared with market interest rates to determine the actual interest rate situation in China over the past 20 years. This study has found that if GCPI is used as a measure, China’s real interest rates have been negative for most years since 1999. Chinese residents have suffered the negative effects of financial repression over the past 20 years, and their property income cannot keep up with the actual losses caused by inflation. Therefore, it is believed that China’s CPI calculation method should be adjusted to take into account the rise in housing prices, so China’s actual inflation level could be more accurately reflected. In view of the above, deepening interest rate marketization reform and expand channels for financial investment are the future development goals of China’s financial system.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fixed 30-year mortgage rates in the United States averaged 6.40 percent in the week ending November 21 of 2025. This dataset provides the latest reported value for - United States MBA 30-Yr Mortgage Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains synthetic real estate transaction data for neighborhoods in the Vancouver metropolitan area, covering the period from 2004 to 2024. The dataset simulates market trends, house price fluctuations, and contains features commonly associated with real estate listings, such as property type, number of bedrooms, bathrooms, square footage, and more. No real-world data is used; this dataset is entirely computer-generated for educational and demonstration purposes.
The dataset includes additional features such as:
A price surge from late 2020 to early 2022 to reflect real-world trends during that period of low-interest rates.
Randomly introduced outliers and noise to simulate abnormal transactions, such as significantly higher or lower sale prices.
This dataset can be used for educational purposes, machine learning projects, and time series forecasting demonstrations. It is a great tool for practicing data cleaning, exploratory data analysis (EDA), feature engineering, and modeling in the context of real estate.
Columns:
Neighborhood: The neighborhood where the property is located, broken down by cities in Vancouver (e.g., Kitsilano, Mount Pleasant, Guildford).
Year: Year of the transaction (2004-2024).
Season: Season of the year (Spring, Summer, Fall, Winter).
Property Type: Type of the property (House, Condo, Townhouse, Duplex, Triplex).
Bedrooms: Number of bedrooms in the property.
Bathrooms: Number of bathrooms in the property.
Year Built: The year the property was built.
Renovation Year: Year of the most recent renovation, if applicable.
Garage Type: Type of garage (None, Single, Double, Triple).
Square Footage (House): The size of the house in square feet.
Square Footage (Land): The size of the land in square feet.
Basement: Whether the basement is finished or not (Finished, Not Finished).
Legal Units: Number of legal units in the property (e.g., 0-2).
Market Price: The market price of the property for the given year and season, including random noise and seasonal fluctuations.
Usage Notes:
This dataset is synthetic and does not represent actual transactions. It is intended for educational purposes and should not be used for real-world financial or investment decisions.
It can be used for projects focused on time series forecasting, regression analysis, and data visualization.
Facebook
TwitterFrom 1 April 2018, LTT replaced Stamp Duty Land Tax (SDLT) on residential and non-residential property and land interests purchased in Wales. The tax rates and tax bands for LTT vary depending on the type of transaction. Taxpayers must notify the WRA of all land transactions with a value above £40,000. There are also circumstances where certain lease transactions are not notifiable if they are less than 7 years in duration. When filing an LTT return, the organisation paying the return has 30 days after the effective date to submit and pay the return. This dataset includes estimates of LTT notifiable transactions received by the WRA by the close of 21 July 2025. Care should be taken with any comparisons over time which involve data from spring 2020 to summer 2021. This is due to the coronavirus (COVID-19) pandemic and changes to LTT rates. A national lockdown on 23 March 2020 resulted in the housing market being mainly closed from this date until 22 June 2020 when it partially re-opened. The market was re-opened more fully on 27 July, to coincide with a change in LTT rates effective until 30 June 2021. There is evidence some purchasers may have brought their transactions forward to June 2021 to benefit from the temporary tax reduction. There were some changes to LTT rates effective from 22 December 2020. Non-residential transactions and higher rates residential transactions were affected. The main residential rates and bands for Land Transaction changed for transactions effective after 10 October 2022. The dataset focuses on the transactions subject to a relief only and includes a breakdown by: - relief type: the four main categories of relief plus an ‘other’ category making up the rest - transaction type: residential, non-residential - transaction description: conveyance / transfer of ownership, granting a new lease, assignment of lease - impact on tax due: yes, no - measure: number of transactions, estimates of the value of tax relieved due, and the tax due on the transactions - effective quarter and year Reliefs can be claimed on both residential and non-residential properties. Reliefs reduce the amount of tax due when certain conditions are met. Multiple reliefs can be applied to a single transaction and reliefs may reduce the tax due to zero (known as a full relief) or by a certain percentage or amount (known as a partial relief). Reliefs are sometimes claimed where they have no impact on the tax due. These can be viewed separately in this dataset and many of them have been reported unnecessarily by the organisations completing the tax returns. As an example, some of these apply to low value residential transactions. Indications are that they are due to a perceived but mistaken need to claim first time buyer relief (which applies for the predecessor tax, but not to Land Transaction Tax). This is known following queries raised with several agents asking why tax reliefs have been claimed where there is no impact on value of the tax. Further information about this category of reliefs is provided in the example 4 in the key quality information found in the weblinks. That example also describes some adjustments that have been made to more correctly identify the value of tax relieved associated with these transactions. Further adjustments are expected in future, and so the numbers shown here for reliefs where there is no impact on tax due are likely to be revised in future. On 7 February 2025, legislation relating to multiple dwellings relief was changed so that cases where dwellings that are subsidiary (worth less than a third of the total value of the transaction) must now be treated as part of the primary dwelling in any main rates residential transaction. This means it is now very rare that multiple dwellings relief will apply in a main rates residential transaction, which we estimate will reduce the value of the relief (or increase total LTT revenues) by between £2.0 and £2.5m per year.
Facebook
TwitterThe Federal Housing Finance Agency (FHFA) is an independent regulatory agency that is not part of the Department of Housing and Urban Development (HUD).
The FHFA was established by the Housing and Economic Recovery Act of 2008 (HERA) and is responsible for the effective supervision, regulation, and housing mission oversight of Fannie Mae, Freddie Mac (the Enterprises), Common Securitization Solutions, LLC (CSS), and the Federal Home Loan Bank System, which includes the 11 Federal Home Loan Banks (FHLBanks) and the Office of Finance. Since 2008, FHFA has also served as conservator of Fannie Mae and Freddie Mac.
Conforming Loan Limits are mortgage limits set annually (as required by HERA) by the FHFA. In order for a mortgage loan to be eligible to be insured by Freddie Mac or Fannie Mae, the loan amount must be less than the loan limit. Mortgage exceeding the Conforming Loan Limit are referred to as "non-conforming loans" or "jumbo loans." While most counties use a single set of Conforming Loan Limits based on the number of units, high cost of living counties use higher Conforming Loan Limits. The FHFA analyzes year-over-year change in average home prices in October of each year using the Monthly Interest Rate Survey (MIRS) to adjust the Conforming Loan Limits for the upcoming year.
Geospatial data in this feature service uses the Census 2010 County geographies.
To learn more about about the FHFA, please visit:https://www.fhfa.gov/AboutUs
For more information about FHFA Conforming Loan Limits, please visit:https://www.fhfa.gov/DataTools/Downloads/Pages/Conforming-Loan-Limits.aspx, for questions about the spatial attribution of this dataset, please reach out to us at GISHelpdesk@hud.gov.
Date of Coverage: 2022 Data Dictionary:DD_FHFA Conforming Loan Limits
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average House Prices in Canada increased to 688800 CAD in October from 687600 CAD in September of 2025. This dataset includes a chart with historical data for Canada Average House Prices.
Facebook
Twitterhttps://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BUSINESS PROBLEM-1 BACKGROUND: The Lending Club is a peer-to-peer lending site where members make loans to each other. The site makes anonymized data on loans and borrowers publicly available. BUSINESS PROBLEM: Using lending club loans data, the team would like to test below hypothesis on how different factors effecing each other (Hint: You may leverage hypothesis testing using statistical tests) a. Intrest rate is varied for different loan amounts (Less intrest charged for high loan amounts) b. Loan length is directly effecting intrest rate. c. Inrest rate varies for different purpose of loans d. There is relationship between FICO scores and Home Ownership. It means that, People with owning home will have high FICO scores. DATA AVAILABLE: LoansData.csv The data have the following variables (with data type and explanation of meaning) Amount.Requested - numeric. The amount (in dollars) requested in the loan application. Amount.Funded.By.Investors - numeric. The amount (in dollars) loaned to the individual. Interest.rate – character. The lending interest rate charged to the borrower. Loan.length - character. The length of time (in months) of the loan. Loan.Purpose – categorical variable. The purpose of the loan as stated by the applicant. Debt.to.Income.Ratio – character. The % of consumer’s gross income going toward paying debts. State - character. The abbreviation for the U.S. state of residence of the loan applicant. Home.ownership - character. Indicates whether the applicant owns, rents, or has a mortgage. Monthly.income - categorical. The monthly income of the applicant (in dollars). FICO.range – categorical (expressed as a string label e.g. “650-655”). A range indicating the applicants FICO score. Open.CREDIT.Lines - numeric. The number of open lines of credit at the time of application. Revolving.CREDIT.Balance - numeric. The total amount outstanding all lines of credit. Inquiries.in.the.Last.6.Months - numeric. Number of credit inquiries in the previous 6 months. Employment.Length - character. Length of time employed at current job.
BUSINESS PROBLEM - 2 BACKGROUND: When an order is placed by a customer of a small manufacturing company, a price quote must be developed for that order. Because each order is unique, quotes must be established on an order-by-order basis by a pricing expert. The price quote process is laborintensive, as prices depend on many factors such as the part number, customer, geographic location, market, and order volume. The sales department manager is concerned that the pricing process is too complex, and that there might be too much variability in the quoted prices. An improvement team is tasked with studying and improving the pricing process. After interviewing experts to develop a better understanding of the current process, the team designed a study to determine if there is variability between pricing experts. That is, do different pricing experts provide different price quotes? Two randomly selected pricing experts, Mary and Barry, were asked to independently provide prices for twelve randomly selected orders. Each expert provided one price for each of the twelve orders. BUSINESS PROBLEM: We would like to assess if there is any difference in the average price quotes provided by Mary and Barry. DATA AVAILABLE: Price_Quotes.csv The data set contains the order number, 1 through 12, and the price quotes by Mary and Barry for each order. Each row in the data set is the same order. Thus, Mary and Barry produced quotes for the same orders. BUSINESS PROBLEM-3: BACKGROUND: The New Life Residential Treatment Facility is a NGO that treatsteenagers who have shown signs of mental illness. It provides housing and supervision of teenagers who are making the transition from psychiatric hospitals back into the community. Because many of the teenagers were severely abused as children and have been involved with the juvenile justice system, behavioral problems are common at New Life. Employee pay is low and staff turnover (attrition) is high. A reengineering program wasinstituted at New Life with the goals of lowering behavioral problems of the kids and decreasing employee turnover rates. As a part of this effort, the following changes were made: Employee shifts were shortened from 10 hours to 8 hours each day. Employees were motivated to become more involved in patient treatments. This included encouraging staff to run varioustherapeutic treatment sessions and allowing staff to have more say in program changes. The activities budget wasincreased. A facility-wide performance evaluation system was putinto place that rewarded staff participation andinnovation. Management and staff instituted a program designed to raise expectations about appropriate behavior from the kids. Thisincluded strict compliance with reporting of behavioral violations, insistence o...
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Description: Insurance Claims Prediction
Introduction: In the insurance industry, accurately predicting the likelihood of claims is essential for risk assessment and policy pricing. However, insurance claims datasets frequently suffer from class imbalance, where the number of non-claims instances far exceeds that of actual claims. This class imbalance poses challenges for predictive modeling, often leading to biased models favoring the majority class, resulting in subpar performance for the minority class, which is typically of greater interest.
Dataset Overview: The dataset utilized in this project comprises historical data on insurance claims, encompassing a variety of information about the policyholders, their demographics, past claim history, and other pertinent features. The dataset is structured to facilitate predictive modeling tasks aimed at accurately identifying the likelihood of future insurance claims.
Key Features: 1. Policyholder Information: This includes demographic details such as age, gender, occupation, marital status, and geographical location. 2. Claim History: Information regarding past insurance claims, including claim amounts, types of claims (e.g., medical, automobile), frequency of claims, and claim durations. 3. Policy Details: Details about the insurance policies held by the policyholders, such as coverage type, policy duration, premium amount, and deductibles. 4. Risk Factors: Variables indicating potential risk factors associated with policyholders, such as credit score, driving record (for automobile insurance), health status (for medical insurance), and property characteristics (for home insurance). 5. External Factors: Factors external to the policyholders that may influence claim likelihood, such as economic indicators, weather conditions, and regulatory changes.
Objective: The primary objective of utilizing this dataset is to develop robust predictive models capable of accurately assessing the likelihood of insurance claims. By leveraging advanced machine learning techniques, such as classification algorithms and ensemble methods, the aim is to mitigate the effects of class imbalance and produce models that demonstrate high predictive performance across both majority and minority classes.
Application Areas: 1. Risk Assessment: Assessing the risk associated with insuring a particular policyholder based on their characteristics and historical claim behavior. 2. Policy Pricing: Determining appropriate premium amounts for insurance policies by estimating the expected claim frequency and severity. 3. Fraud Detection: Identifying fraudulent insurance claims by detecting anomalous patterns in claim submissions and policyholder behavior. 4. Customer Segmentation: Segmenting policyholders into distinct groups based on their risk profiles and insurance needs to tailor marketing strategies and policy offerings.
Conclusion: The insurance claims dataset serves as a valuable resource for developing predictive models aimed at enhancing risk management, policy pricing, and overall operational efficiency within the insurance industry. By addressing the challenges posed by class imbalance and leveraging the rich array of features available, organizations can gain valuable insights into insurance claim likelihood and make informed decisions to mitigate risk and optimize business outcomes.
| Feature | Description |
|---|---|
| policy_id | Unique identifier for the insurance policy. |
| subscription_length | The duration for which the insurance policy is active. |
| customer_age | Age of the insurance policyholder, which can influence the likelihood of claims. |
| vehicle_age | Age of the vehicle insured, which may affect the probability of claims due to factors like wear and tear. |
| model | The model of the vehicle, which could impact the claim frequency due to model-specific characteristics. |
| fuel_type | Type of fuel the vehicle uses (e.g., Petrol, Diesel, CNG), which might influence the risk profile and claim likelihood. |
| max_torque, max_power | Engine performance characteristics that could relate to the vehicle’s mechanical condition and claim risks. |
| engine_type | The type of engine, which might have implications for maintenance and claim rates. |
| displacement, cylinder | Specifications related to the engine size and construction, affec... |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in Sweden was last recorded at 1.75 percent. This dataset provides the latest reported value for - Sweden Interest Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in Norway was last recorded at 4 percent. This dataset provides the latest reported value for - Norway Interest Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in Mexico was last recorded at 7.25 percent. This dataset provides - Mexico Interest Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">
A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?
Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.