Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">
A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?
Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains 2000 rows of house-related data, representing various features that could influence house prices. Below, we discuss key aspects of the dataset, which include its structure, the choice of features, and potential use cases for analysis.
The dataset is designed to capture essential attributes for predicting house prices, including:
Area: Square footage of the house, which is generally one of the most important predictors of price. Bedrooms & Bathrooms: The number of rooms in a house significantly affects its value. Homes with more rooms tend to be priced higher. Floors: The number of floors in a house could indicate a larger, more luxurious home, potentially raising its price. Year Built: The age of the house can affect its condition and value. Newly built houses are generally more expensive than older ones. Location: Houses in desirable locations such as downtown or urban areas tend to be priced higher than those in suburban or rural areas. Condition: The current condition of the house is critical, as well-maintained houses (in 'Excellent' or 'Good' condition) will attract higher prices compared to houses in 'Fair' or 'Poor' condition. Garage: Availability of a garage can increase the price due to added convenience and space. Price: The target variable, representing the sale price of the house, used to train machine learning models to predict house prices based on the other features.
Area Distribution: The area of the houses in the dataset ranges from 500 to 5000 square feet, which allows analysis across different types of homes, from smaller apartments to larger luxury houses. Bedrooms and Bathrooms: The number of bedrooms varies from 1 to 5, and bathrooms from 1 to 4. This variance enables analysis of homes with different sizes and layouts. Floors: Houses in the dataset have between 1 and 3 floors. This feature could be useful for identifying the influence of multi-level homes on house prices. Year Built: The dataset contains houses built from 1900 to 2023, giving a wide range of house ages to analyze the effects of new vs. older construction. Location: There is a mix of urban, suburban, downtown, and rural locations. Urban and downtown homes may command higher prices due to proximity to amenities. Condition: Houses are labeled as 'Excellent', 'Good', 'Fair', or 'Poor'. This feature helps model the price differences based on the current state of the house. Price Distribution: Prices range between $50,000 and $1,000,000, offering a broad spectrum of property values. This range makes the dataset appropriate for predicting a wide variety of housing prices, from affordable homes to luxury properties.
3. Correlation Between Features
A key area of interest is the relationship between various features and house price: Area and Price: Typically, a strong positive correlation is expected between the size of the house (Area) and its price. Larger homes are likely to be more expensive. Location and Price: Location is another major factor. Houses in urban or downtown areas may show a higher price on average compared to suburban and rural locations. Condition and Price: The condition of the house should show a positive correlation with price. Houses in better condition should be priced higher, as they require less maintenance and repair. Year Built and Price: Newer houses might command a higher price due to better construction standards, modern amenities, and less wear-and-tear, but some older homes in good condition may retain historical value. Garage and Price: A house with a garage may be more expensive than one without, as it provides extra storage or parking space.
The dataset is well-suited for various machine learning and data analysis applications, including:
House Price Prediction: Using regression techniques, this dataset can be used to build a model to predict house prices based on the available features. Feature Importance Analysis: By using techniques such as feature importance ranking, data scientists can determine which features (e.g., location, area, or condition) have the greatest impact on house prices. Clustering: Clustering techniques like k-means could help identify patterns in the data, such as grouping houses into segments based on their characteristics (e.g., luxury homes, affordable homes). Market Segmentation: The dataset can be used to perform segmentation by location, price range, or house type to analyze trends in specific sub-markets, like luxury vs. affordable housing. Time-Based Analysis: By studying how house prices vary with the year built or the age of the house, analysts can derive insights into the trends of older vs. newer homes.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Average house prices are derived from data supplied by the mortgage lending agencies on loans approved by them rather than loans paid. In comparing house prices figures from one period to another, account should be taken of the fact that changes in the mix of houses (incl apartments) will affect the average figures. The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change. Excluding apartments, measured in € Figure changed on the 27/6/16 as revised data received from the Local authority .hidden { display: none }
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
This dataset contains various features of residential properties along with their corresponding prices. It is suitable for exploring and analyzing factors influencing housing prices and for building predictive models to estimate the price of a property based on its attributes.
| Feature | Description |
|---|---|
| price | The price of the property. |
| area | The total area of the property in square feet. |
| bedrooms | The number of bedrooms in the property. |
| bathrooms | The number of bathrooms in the property. |
| stories | The number of stories (floors) in the property. |
| mainroad | Indicates whether the property is located on a main road (binary: yes/no). |
| guestroom | Indicates whether the property has a guest room (binary: yes/no). |
| basement | Indicates whether the property has a basement (binary: yes/no). |
| hotwaterheating | Indicates whether the property has hot water heating (binary: yes/no). |
| airconditioning | Indicates whether the property has air conditioning (binary: yes/no). |
| parking | The number of parking spaces available with the property. |
| prefarea | Indicates whether the property is in a preferred area (binary: yes/no). |
| furnishingstatus | The furnishing status of the property (e.g., furnished, semi-furnished, unfurnished). |
License: This dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The Housing Data Extracted from Homes.com (USA) dataset is a comprehensive collection of 2 million real estate listings sourced from Homes.com, one of the leading real estate platforms in the United States. This dataset offers detailed insights into the U.S. housing market, making it an invaluable resource for real estate professionals, investors, researchers, and analysts.
The dataset contains extensive property details, including location, price, property type (single-family homes, condos, apartments), number of bedrooms and bathrooms, square footage, lot size, year built, and availability status. Organized in CSV format, it provides users with easy access to structured data for analyzing trends, developing investment strategies, or building real estate applications.
Key Features:
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Gain a complete view of the real estate market with our Zillow datasets. Track price trends, rental/sale status, and price per square foot with the Zillow Price History dataset and explore detailed listings with prices, locations, and features using the Zillow Properties Listing dataset. Over 134M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Zpid
City
State
Home Status
Street Address
Zipcode
Home Type
Living Area Value
Bedrooms
Bathrooms
Price
Property Type
Date Sold
Annual Homeowners Insurance
Price Per Square Foot
Rent Zestimate
Tax Assessed Value
Zestimate
Home Values
Lot Area
Lot Area Unit
Living Area
Living Area Units
Property Tax Rate
Page View Count
Favorite Count
Time On Zillow
Time Zone
Abbreviated Address
Brokerage Name
And much more
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Explore the Redfin USA Properties Dataset, available in CSV format. This extensive dataset provides valuable insights into the U.S. real estate market, including detailed property listings, prices, property types, and more across various states and cities. Perfect for those looking to conduct in-depth market analysis, real estate investment research, or financial forecasting.
Key Features:
Who Can Benefit From This Dataset:
Download the Redfin USA Properties Dataset to access essential information on the U.S. housing market, ideal for professionals in real estate, finance, and data analytics. Unlock key insights to make informed decisions in a dynamic market environment.
Looking for deeper insights or a custom data pull from Redfin?
Send a request with just one click and explore detailed property listings, price trends, and housing data.
🔗 Request Redfin Real Estate Data
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Median Sales Price of Houses Sold for the United States (MSPUS) from Q1 1963 to Q2 2025 about sales, median, housing, and USA.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average House Prices in the United States increased to 534100 USD in August from 478200 USD in July of 2025. This dataset includes a chart with historical data for the United States New Home Average Sales Price.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This table shows the average purchase price that has been paid in the reporting period for existing own homes purchased by a private individual. The average purchase price of existing own homes may differ from the price index of existing own homes. The average purchase price is no indicator for price developments of owner-occupied residential property. The average purchase price reflects the average price of dwellings sold in a particular period. The fact that de dwellings sold differs from one period to another is not taken into account. The following instance explains which problems are entailed by the continually changing of the quality of the dwellings sold. Suppose in February of a particular year mainly big houses with extensive gardens beautifully situated alongside canals are sold, whereas in March many small terraced houses are sold. In that case the average purchase price in February will be higher than in March but this does not mean that house prices are increased. See note 3 for a link to the article 'Why the average purchase price is not an indicator'.
Data available from: 1995
Status of the figures: The figures in this table are immediately definitive. The calculation of these figures is based on the number of notary transactions that are registered every month by the Dutch Land Registry Office (Kadaster). A revision of the figures is exceptional and occurs specifically if an error significantly exceeds the acceptable statistical margins. The average purchasing prices of existing owner-occupied sold homes can be calculated by Kadaster at a later date. These figures are usually the same as the publication on Statline, but in some periods they differ. Kadaster calculates the average purchasing prices based on the most recent data. These may have changed since the first publication. Statistics Netherlands uses figures from the first publication in accordance with the revision policy described above.
Changes as of 17 February 2025: Added average purchase prices of the municipalities for the year 2024.
When will new figures be published? New figures are published approximately one to three months after the period under review.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Median price paid for residential property in England and Wales, by property type and administrative geographies. Annual data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single Family Home Prices in the United States increased to 415200 USD in October from 412300 USD in September of 2025. This dataset provides - United States Existing Single Family Home Prices- actual values, historical data, forecast, chart, statistics, economic calendar and news.
Facebook
TwitterRedfin is a real estate brokerage and publishes the US housing market data on a regular basis. Using this dataset, you can analyze and visualize housing market data for US cities. Timeline: Starting from February 2012 until the present time (Data is refreshed and updated on a monthly basis)
The dataset has the following columns:
- period_begin
- period_end
- period_duration
- region_type
- region_type_id
- table_id
- is_seasonally_adjusted. (indicates if prices are seasonally adjusted; f represents False)
- region
- city
- state
- state_code
- property_type
- property_type_id
- median_sale_price
- median_sale_price_mom (median sale price changes month over month)
- median_sale_price_yoy (median sale price changes year over year)
- median_list_price
- median_list_price_mom (median list price changes month over month)
- median_list_price_yoy (median list price changes year over year)
- median_ppsf (median sale price per square foot)
- median_ppsf_mom (median sale price per square foot changes month over month)
- median_ppsf_yoy (median sale price per square foot changes year over year)
- median_list_ppsf (median list price per square foot)
- median_list_ppsf_mom (median list price per square foot changes month over month)
- median_list_ppsf_yoy. (median list price per square foot changes year over year)
- homes_sold (number of homes sold)
- homes_sold_mom (number of homes sold month over month)
- homes_sold_yoy (number of homes sold year over year)
- pending_sales
- pending_sales_mom
- pending_sales_yoy
- new_listings
- new_listings_mom
- new_listings_yoy
- inventory
- inventory_mom
- inventory_yoy
- months_of_supply
- months_of_supply_mom
- months_of_supply_yoy
- median_dom (median days on market until property is sold)
- median_dom_mom (median days on market changes month over month)
- median_dom_yoy (median days on market changes year over year)
- avg_sale_to_list (average sale price to list price ratio)
- avg_sale_to_list_mom (average sale price to list price ratio changes month over month)
- avg_sale_to_list_yoy (average sale price to list price ratio changes year over year)
- sold_above_list
- sold_above_list_mom
- sold_above_list_yoy
- price_drops
- price_drops_mom
- price_drops_yoy
- off_market_in_two_weeks (number of properties that will be taken off the market within 2 weeks)
- off_market_in_two_weeks_mom (changes in number of properties that will be taken off the market within 2 weeks, month over month)
- off_market_in_two_weeks_yoy (changes in number of properties that will be taken off the market within 2 weeks, year over year)
- parent_metro_region
- parent_metro_region_metro_code
- last_updated
Filetype: gzip (gz) Support for gzip files in Python: https://docs.python.org/3/library/gzip.html
Data Source & Credit: Redfin.com
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
House Price Index YoY in the United States decreased to 1.70 percent in September from 2.40 percent in August of 2025. This dataset includes a chart with historical data for the United States FHFA House Price Index YoY.
Facebook
TwitterThis table contains data on the percent of households paying more than 30% (or 50%) of monthly household income towards housing costs for California, its regions, counties, cities/towns, and census tracts. Data is from the U.S. Department of Housing and Urban Development (HUD), Consolidated Planning Comprehensive Housing Affordability Strategy (CHAS) and the U.S. Census Bureau, American Community Survey (ACS). The table is part of a series of indicators in the [Healthy Communities Data and Indicators Project of the Office of Health Equity] Affordable, quality housing is central to health, conferring protection from the environment and supporting family life. Housing costs—typically the largest, single expense in a family's budget—also impact decisions that affect health. As housing consumes larger proportions of household income, families have less income for nutrition, health care, transportation, education, etc. Severe cost burdens may induce poverty—which is associated with developmental and behavioral problems in children and accelerated cognitive and physical decline in adults. Low-income families and minority communities are disproportionately affected by the lack of affordable, quality housing. More information about the data table and a data dictionary can be found in the Attachments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was generated for analyzing the economic impacts of subway networks on housing prices in metropolitan areas. The provision of transit networks and accompanying improvement in accessibility induce various impacts and we focused on the economic impacts realized through housing prices. As a proxy of housing price, we consider the price of condominiums, the dominant housing type in South Korea. Although our focus is transit accessibility and housing prices, the presented dataset is applicable to other studies. In particular, it provides a wide range of variables closely related to housing price, including housing properties, local amenities, local demographic characteristics, and control variables for the seasonality. Many of these variables were scientifically generated by our research team. Various distance variables were constructed in a geographic information system environment based on public data and they are useful not only for exploring environmental impacts on housing prices, but also for other statistical analyses in regard to real estate and social science research. The four metropolitan areas covered by the data—Busan, Daegu, Daejeon, and Gwangju—are independent of the transit systems of Greater Seoul, providing accurate information on the metropolitan structure separate from the capital city.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview: This dataset was collected and curated to support research on predicting real estate prices using machine learning algorithms, specifically Support Vector Regression (SVR) and Gradient Boosting Machine (GBM). The dataset includes comprehensive information on residential properties, enabling the development and evaluation of predictive models for accurate and transparent real estate appraisals.Data Source: The data was sourced from Department of Lands and Survey real estate listings.Features: The dataset contains the following key attributes for each property:Area (in square meters): The total living area of the property.Floor Number: The floor on which the property is located.Location: Geographic coordinates or city/region where the property is situated.Type of Apartment: The classification of the property, such as studio, one-bedroom, two-bedroom, etc.Number of Bathrooms: The total number of bathrooms in the property.Number of Bedrooms: The total number of bedrooms in the property.Property Age (in years): The number of years since the property was constructed.Property Condition: A categorical variable indicating the condition of the property (e.g., new, good, fair, needs renovation).Proximity to Amenities: The distance to nearby amenities such as schools, hospitals, shopping centers, and public transportation.Market Price (target variable): The actual sale price or listed price of the property.Data Preprocessing:Normalization: Numeric features such as area and proximity to amenities were normalized to ensure consistency and improve model performance.Categorical Encoding: Categorical features like property condition and type of apartment were encoded using one-hot encoding or label encoding, depending on the specific model requirements.Missing Values: Missing data points were handled using appropriate imputation techniques or by excluding records with significant missing information.Usage: This dataset was utilized to train and test machine learning models, aiming to predict the market price of residential properties based on the provided attributes. The models developed using this dataset demonstrated improved accuracy and transparency over traditional appraisal methods.Dataset Availability: The dataset is available for public use under the [CC BY 4.0]. Users are encouraged to cite the related publication when using the data in their research or applications.Citation: If you use this dataset in your research, please cite the following publication:[Real Estate Decision-Making: Precision in Price Prediction through Advanced Machine Learning Algorithms].
Facebook
TwitterThe FHFA House Price Index (FHFA HPI®) is the nation’s only collection of public, freely available house price indexes that measure changes in single-family home values based on data from all 50 states and over 400 American cities that extend back to the mid-1970s. The FHFA HPI incorporates tens of millions of home sales and offers insights about house price fluctuations at the national, census division, state, metro area, county, ZIP code, and census tract levels. FHFA uses a fully transparent methodology based upon a weighted, repeat-sales statistical technique to analyze house price transaction data. What does the FHFA HPI represent? The FHFA HPI is a broad measure of the movement of single-family house prices. The FHFA HPI is a weighted, repeat-sales index, meaning that it measures average price changes in repeat sales or refinancings on the same properties. This information is obtained by reviewing repeat mortgage transactions on single-family properties whose mortgages have been purchased or securitized by Fannie Mae or Freddie Mac since January 1975. The FHFA HPI serves as a timely, accurate indicator of house price trends at various geographic levels. Because of the breadth of the sample, it provides more information than is available in other house price indexes. It also provides housing economists with an improved analytical tool that is useful for estimating changes in the rates of mortgage defaults, prepayments and housing affordability in specific geographic areas. U.S. Federal Housing Finance Agency, All-Transactions House Price Index for Connecticut [CTSTHPI], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/CTSTHPI, August 2, 2023.
Facebook
TwitterAverage house prices are derived from data supplied by the mortgage lending agencies on loans approved by them rather than loans paid. In comparing house prices figures from one period to another, account should be taken of the fact that changes in the mix of houses (incl apartments) will affect the average figures. The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change. Excluding apartments, measured in € Figure changed on the 27/6/16 as revised data received from the Local authority
Facebook
TwitterThis dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.
Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.
Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.
Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.
Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.
The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.
It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.
This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">
A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?
Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.