Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">
A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?
Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains 2000 rows of house-related data, representing various features that could influence house prices. Below, we discuss key aspects of the dataset, which include its structure, the choice of features, and potential use cases for analysis.
The dataset is designed to capture essential attributes for predicting house prices, including:
Area: Square footage of the house, which is generally one of the most important predictors of price. Bedrooms & Bathrooms: The number of rooms in a house significantly affects its value. Homes with more rooms tend to be priced higher. Floors: The number of floors in a house could indicate a larger, more luxurious home, potentially raising its price. Year Built: The age of the house can affect its condition and value. Newly built houses are generally more expensive than older ones. Location: Houses in desirable locations such as downtown or urban areas tend to be priced higher than those in suburban or rural areas. Condition: The current condition of the house is critical, as well-maintained houses (in 'Excellent' or 'Good' condition) will attract higher prices compared to houses in 'Fair' or 'Poor' condition. Garage: Availability of a garage can increase the price due to added convenience and space. Price: The target variable, representing the sale price of the house, used to train machine learning models to predict house prices based on the other features.
Area Distribution: The area of the houses in the dataset ranges from 500 to 5000 square feet, which allows analysis across different types of homes, from smaller apartments to larger luxury houses. Bedrooms and Bathrooms: The number of bedrooms varies from 1 to 5, and bathrooms from 1 to 4. This variance enables analysis of homes with different sizes and layouts. Floors: Houses in the dataset have between 1 and 3 floors. This feature could be useful for identifying the influence of multi-level homes on house prices. Year Built: The dataset contains houses built from 1900 to 2023, giving a wide range of house ages to analyze the effects of new vs. older construction. Location: There is a mix of urban, suburban, downtown, and rural locations. Urban and downtown homes may command higher prices due to proximity to amenities. Condition: Houses are labeled as 'Excellent', 'Good', 'Fair', or 'Poor'. This feature helps model the price differences based on the current state of the house. Price Distribution: Prices range between $50,000 and $1,000,000, offering a broad spectrum of property values. This range makes the dataset appropriate for predicting a wide variety of housing prices, from affordable homes to luxury properties.
3. Correlation Between Features
A key area of interest is the relationship between various features and house price: Area and Price: Typically, a strong positive correlation is expected between the size of the house (Area) and its price. Larger homes are likely to be more expensive. Location and Price: Location is another major factor. Houses in urban or downtown areas may show a higher price on average compared to suburban and rural locations. Condition and Price: The condition of the house should show a positive correlation with price. Houses in better condition should be priced higher, as they require less maintenance and repair. Year Built and Price: Newer houses might command a higher price due to better construction standards, modern amenities, and less wear-and-tear, but some older homes in good condition may retain historical value. Garage and Price: A house with a garage may be more expensive than one without, as it provides extra storage or parking space.
The dataset is well-suited for various machine learning and data analysis applications, including:
House Price Prediction: Using regression techniques, this dataset can be used to build a model to predict house prices based on the available features. Feature Importance Analysis: By using techniques such as feature importance ranking, data scientists can determine which features (e.g., location, area, or condition) have the greatest impact on house prices. Clustering: Clustering techniques like k-means could help identify patterns in the data, such as grouping houses into segments based on their characteristics (e.g., luxury homes, affordable homes). Market Segmentation: The dataset can be used to perform segmentation by location, price range, or house type to analyze trends in specific sub-markets, like luxury vs. affordable housing. Time-Based Analysis: By studying how house prices vary with the year built or the age of the house, analysts can derive insights into the trends of older vs. newer homes.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
This dataset contains various features of residential properties along with their corresponding prices. It is suitable for exploring and analyzing factors influencing housing prices and for building predictive models to estimate the price of a property based on its attributes.
| Feature | Description |
|---|---|
| price | The price of the property. |
| area | The total area of the property in square feet. |
| bedrooms | The number of bedrooms in the property. |
| bathrooms | The number of bathrooms in the property. |
| stories | The number of stories (floors) in the property. |
| mainroad | Indicates whether the property is located on a main road (binary: yes/no). |
| guestroom | Indicates whether the property has a guest room (binary: yes/no). |
| basement | Indicates whether the property has a basement (binary: yes/no). |
| hotwaterheating | Indicates whether the property has hot water heating (binary: yes/no). |
| airconditioning | Indicates whether the property has air conditioning (binary: yes/no). |
| parking | The number of parking spaces available with the property. |
| prefarea | Indicates whether the property is in a preferred area (binary: yes/no). |
| furnishingstatus | The furnishing status of the property (e.g., furnished, semi-furnished, unfurnished). |
License: This dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Facebook
TwitterThis dataset contains prices of New York houses, providing valuable insights into the real estate market in the region. It includes information such as broker titles, house types, prices, number of bedrooms and bathrooms, property square footage, addresses, state, administrative and local areas, street names, and geographical coordinates.
- BROKERTITLE: Title of the broker
- TYPE: Type of the house
- PRICE: Price of the house
- BEDS: Number of bedrooms
- BATH: Number of bathrooms
- PROPERTYSQFT: Square footage of the property
- ADDRESS: Full address of the house
- STATE: State of the house
- MAIN_ADDRESS: Main address information
- ADMINISTRATIVE_AREA_LEVEL_2: Administrative area level 2 information
- LOCALITY: Locality information
- SUBLOCALITY: Sublocality information
- STREET_NAME: Street name
- LONG_NAME: Long name
- FORMATTED_ADDRESS: Formatted address
- LATITUDE: Latitude coordinate of the house
- LONGITUDE: Longitude coordinate of the house
- Price analysis: Analyze the distribution of house prices to understand market trends and identify potential investment opportunities.
- Property size analysis: Explore the relationship between property square footage and prices to assess the value of different-sized houses.
- Location-based analysis: Investigate geographical patterns to identify areas with higher or lower property prices.
- Bedroom and bathroom trends: Analyze the impact of the number of bedrooms and bathrooms on house prices.
- Broker performance analysis: Evaluate the influence of different brokers on the pricing of houses.
If you find this dataset useful, your support through an upvote would be greatly appreciated ❤️🙂 Thank you
Facebook
TwitterRedfin is a real estate brokerage and publishes the US housing market data on a regular basis. Using this dataset, you can analyze and visualize housing market data for US cities. Timeline: Starting from February 2012 until the present time (Data is refreshed and updated on a monthly basis)
The dataset has the following columns:
- period_begin
- period_end
- period_duration
- region_type
- region_type_id
- table_id
- is_seasonally_adjusted. (indicates if prices are seasonally adjusted; f represents False)
- region
- city
- state
- state_code
- property_type
- property_type_id
- median_sale_price
- median_sale_price_mom (median sale price changes month over month)
- median_sale_price_yoy (median sale price changes year over year)
- median_list_price
- median_list_price_mom (median list price changes month over month)
- median_list_price_yoy (median list price changes year over year)
- median_ppsf (median sale price per square foot)
- median_ppsf_mom (median sale price per square foot changes month over month)
- median_ppsf_yoy (median sale price per square foot changes year over year)
- median_list_ppsf (median list price per square foot)
- median_list_ppsf_mom (median list price per square foot changes month over month)
- median_list_ppsf_yoy. (median list price per square foot changes year over year)
- homes_sold (number of homes sold)
- homes_sold_mom (number of homes sold month over month)
- homes_sold_yoy (number of homes sold year over year)
- pending_sales
- pending_sales_mom
- pending_sales_yoy
- new_listings
- new_listings_mom
- new_listings_yoy
- inventory
- inventory_mom
- inventory_yoy
- months_of_supply
- months_of_supply_mom
- months_of_supply_yoy
- median_dom (median days on market until property is sold)
- median_dom_mom (median days on market changes month over month)
- median_dom_yoy (median days on market changes year over year)
- avg_sale_to_list (average sale price to list price ratio)
- avg_sale_to_list_mom (average sale price to list price ratio changes month over month)
- avg_sale_to_list_yoy (average sale price to list price ratio changes year over year)
- sold_above_list
- sold_above_list_mom
- sold_above_list_yoy
- price_drops
- price_drops_mom
- price_drops_yoy
- off_market_in_two_weeks (number of properties that will be taken off the market within 2 weeks)
- off_market_in_two_weeks_mom (changes in number of properties that will be taken off the market within 2 weeks, month over month)
- off_market_in_two_weeks_yoy (changes in number of properties that will be taken off the market within 2 weeks, year over year)
- parent_metro_region
- parent_metro_region_metro_code
- last_updated
Filetype: gzip (gz) Support for gzip files in Python: https://docs.python.org/3/library/gzip.html
Data Source & Credit: Redfin.com
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Detailed Real Estate Data for Predicting House Prices and Analyzing Market Trends
This dataset contains information on 21,613 properties, making it a comprehensive resource for exploring real estate market trends and building predictive models for house prices. The data includes various features capturing property details, location, and market conditions, providing ample opportunities for data exploration, visualization, and machine learning applications.
General Information:
id: Unique identifier for each property. date: Date of sale. Price Details:
price: Sale price of the house. Property Features:
bedrooms: Number of bedrooms. bathrooms: Number of bathrooms (including partials as fractions). sqft_living: Living space area in square feet. sqft_lot: Lot size in square feet. floors: Number of floors. waterfront: Whether the property has a waterfront view. view: Quality of the view rating. condition: Overall condition of the house. grade: Grade of construction and design (scale of 1–13). Additional Metrics:
sqft_above: Square footage of the property above ground. sqft_basement: Basement area in square feet. yr_built: Year the property was built. yr_renovated: Year of last renovation. Location Coordinates:
zipcode: ZIP code of the property. lat and long: Latitude and longitude coordinates. Neighbor Comparisons:
sqft_living15: Average living space of 15 nearest properties. sqft_lot15: Average lot size of 15 nearest properties. This dataset is a valuable resource for anyone interested in real estate analytics, machine learning, or geographic data visualization.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Task Description: Real Estate Price Prediction
This task involves predicting the price of real estate properties based on various features that influence the value of a property. The dataset contains several attributes of real estate properties such as square footage, the number of bedrooms, bathrooms, floors, the year the property was built, whether the property has a garden or pool, the size of the garage, the location score, and the distance from the city center.
The goal is to build a regression model that can predict the Price of a property based on the provided features.
Dataset Columns:
ID: A unique identifier for each property.
Square_Feet: The area of the property in square meters.
Num_Bedrooms: The number of bedrooms in the property.
Num_Bathrooms: The number of bathrooms in the property.
Num_Floors: The number of floors in the property.
Year_Built: The year the property was built.
Has_Garden: Indicates whether the property has a garden (1 for yes, 0 for no).
Has_Pool: Indicates whether the property has a pool (1 for yes, 0 for no).
Garage_Size: The size of the garage in square meters.
Location_Score: A score from 0 to 10 indicating the quality of the neighborhood (higher scores indicate better neighborhoods).
Distance_to_Center: The distance from the property to the city center in kilometers.
Price: The target variable that represents the price of the property. This is the value we aim to predict.
Objective: The goal of this task is to develop a regression model that predicts the Price of a real estate property using the other features as inputs. The model should be able to learn the relationship between these features and the price, providing an accurate prediction for unseen data.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
So this data set is collected for completing a college project ,which is an android app for calculating the price of houses.
This data is scraped from magic bricks website between june 2021 and july 2021 .
magicbricks.com
With the help of the data available one can make a regression model to predict house prices.
Facebook
TwitterThis dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.
Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.
Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.
Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.
Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.
The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.
It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.
This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides insights into the global housing market, covering various economic factors from 2015 to 2024. It includes details about property prices, rental yields, interest rates, and household income across multiple countries. This dataset is ideal for real estate analysis, financial forecasting, and market trend visualization.
| Column Name | Description |
|---|---|
Country | The country where the housing market data is recorded 🌍 |
Year | The year of observation 📅 |
Average House Price ($) | The average price of houses in USD 💰 |
Median Rental Price ($) | The median monthly rent for properties in USD 🏠 |
Mortgage Interest Rate (%) | The average mortgage interest rate percentage 📉 |
Household Income ($) | The average annual household income in USD 🏡 |
Population Growth (%) | The percentage increase in population over the year 👥 |
Urbanization Rate (%) | Percentage of the population living in urban areas 🏙️ |
Homeownership Rate (%) | The percentage of people who own their homes 🔑 |
GDP Growth Rate (%) | The annual GDP growth percentage 📈 |
Unemployment Rate (%) | The percentage of unemployed individuals in the labor force 💼 |
Facebook
TwitterBy Jeff [source]
This dataset contains information on 1000 properties in Australia, including location, size, price, and other details
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
If you're looking for a dataset on Australian housing data, this is a great option. This dataset contains information on over 1000 properties in Australia, including location, size, price, and other details. With this data, you can answer questions like What is the average price of a home in Australia?, What are the most popular type of homes in Australia?, and more
- This dataset can be used to predict hosing prices in Australia.
- This dataset can be used to find relationships between housing prices and location.
- This dataset can be used to find relationships between housing prices and features such as size, number of bedrooms, and number of bathrooms
If you use this dataset in your research, please credit the original authors. Data Source
License
See the dataset description for more information.
File: RealEstateAU_1000_Samples.csv | Column name | Description | |:--------------------|:---------------------------------------------------------------------------------------| | breadcrumb | A breadcrumb is a text trail that shows the user's location within a website. (String) | | category_name | The name of the category that the listing belongs to. (String) | | property_type | The type of property being listed. (String) | | building_size | The size of the property's building, in square meters. (Numeric) | | land_size | The size of the property's land, in square meters. (Numeric) | | preferred_size | The preferred size of the property, in square meters. (Numeric) | | open_date | The date that the property was first listed for sale. (Date) | | listing_agency | The agency that is listing the property. (String) | | price | The listing price of the property. (Numeric) | | location_number | The number that corresponds to the property's location. (Numeric) | | location_type | The type of location that the property is in. (String) | | location_name | The name of the location that the property is in. (String) | | address | The property's address. (String) | | address_1 | The first line of the property's address. (String) | | city | The city that the property is located in. (String) | | state | The state that the property is located in. (String) | | zip_code | The zip code that the property is located in. (String) | | phone | The listing agent's phone number. (String) | | latitude | The property's latitude. (Numeric) | | longitude | The property's longitude. (Numeric) | | product_depth | The depth of the product. (Numeric) | | bedroom_count | The number of bedrooms in the property. (Numeric) | | bathroom_count | The number of bathrooms in the property. (Numeric) | | parking_count | The number of parking spaces in the property. (Numeric) | | RunDate | The date that the listing was last updated. (Date) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Jeff.
Facebook
TwitterDataset on Housing Prices in the Philippines, scraped from from Lamudi on May 2023.
Facebook
TwitterThis is a set of data created from imaginary data of house prices in an urban environment - Paris. I recommend using this dataset for educational purposes, for practice and to acquire the necessary knowledge. What I'm trying to do next is to create a classification dataset with same data from this dataset, I'll add a new column for class attribute ofc. Here is a classification dataset ---> classification dataset <---
What's inside is more than just rows and columns. You can see house details listed as column names.
All attributes are numeric variables and they are listed bellow:
Idea was to create dataset that is good for regression and that gives adequate results.
Facebook
TwitterDescription: This dataset provides historical housing prices scraped from Centaline Property Hong Kong, one of the largest real estate agencies in Hong Kong. The dataset includes information on the date of the transaction, the property address, floor plan, saleable area, unit rate, source, and district. The dataset covers a period of time spanning several years, allowing for analysis of trends and changes in the Hong Kong housing market.
Columns: Date: the date of the property transaction Address: the address of the property Floor Plan: -- Price: the price of the property Changes: any changes made to the property since the last transaction Saleable Area: the area of the property that can be sold to a buyer Unit Rate: the price per square foot of saleable area Source: the source of the data (Centaline Property Hong Kong/ Land Registry) District: the district in which the property is located in Hong Kong
Facebook
TwitterThis dataset was created by 173050055
Released under Other (specified in description)
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset consists of Price of Houses in King County , Washington from sales between May 2014 and May 2015. Along with house price it consists of information on 18 house features, date of sale and ID of sale.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A housing market prediction that many experts agree on is that it will be a seller’s market. Home prices are expected to rise for some time due to increased demand and limited supply. Millennials are at the age to start investing in the real estate market for the first time. Hence, the demand for residential and commercial projects is rising with every passing day. The future of real estate will witness a rise in demand and limited supply, resulting in it being a seller’s market.
Your 1 upvote encourages me to upload more trending datasets. Thanks for your support.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F8355503%2F20827a3fb7a1b4bc6e3227006563692f%2FCapture.PNG?generation=1696752722617297&alt=media" alt="">
If you liked the dataset, please upvote to upload more trending datasets. Thanks for your support.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
A dataset comprising various variables around housing and demographics for the top 50 American cities by population.
Variables:
Zip Code: Zip code within which the listing is present.
Price: Listed price for the property.
Beds: Number of beds mentioned in the listing.
Baths: Number of baths mentioned in the listing.
Living Space: The total size of the living space, in square feet, mentioned in the listing.
Address: Street address of the listing.
City: City name where the listing is located.
State: State name where the listing is located.
Zip Code Population: The estimated number of individuals within the zip code. Data from Simplemaps.com.
Zip Code Density: The estimated number of individuals per square mile within the zip code. Data from Simplemaps.com.
County: County where the listing is located.
Median Household income: Estimated median household income. Data from the U.S. Census Bureau.
Latitude: Latitude of the zip code. ** Data from Simplemaps.com.**
Longitude: Longitude of the zip code. Data from Simplemaps.com.
Facebook
TwitterHouse Price prediction Mini Dataset For Begging notebooks
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 2,000 entries of house price data from all states in Malaysia, providing a comprehensive overview of the country’s real estate market for 2025. Sourced from Brickz, a trusted platform for property transaction insights, it includes detailed information such as property location, tenure, type, median prices, and transaction counts. This dataset is ideal for real estate market analysis, predictive modeling, and exploring trends across Malaysia’s diverse property market.
https://encrypted-tbn1.gstatic.com/licensed-image?q=tbn:ANd9GcR8ttDRWTx7dIxuUegBTsggS4a6tQrnNA6DEW_HJu2DphQNsverV0PYsSkdbSdqm4qRaRuBOh4Txbv11yXMxIKWqh-_WAkeTuQI8Diu-Q" alt="Kuala Lumpur, Malaysia">
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">
A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?
Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.