Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">
A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?
Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Median price paid for residential property in England and Wales, for all property types by lower layer super output area. Annual data..
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Housing Index in China remained unchanged at -2.20 percent in October. This dataset provides the latest reported value for - China Newly Built House Prices YoY Change - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single Family Home Prices in the United States increased to 415200 USD in October from 412300 USD in September of 2025. This dataset provides - United States Existing Single Family Home Prices- actual values, historical data, forecast, chart, statistics, economic calendar and news.
Facebook
TwitterThis dataset contains prices of New York houses, providing valuable insights into the real estate market in the region. It includes information such as broker titles, house types, prices, number of bedrooms and bathrooms, property square footage, addresses, state, administrative and local areas, street names, and geographical coordinates.
- BROKERTITLE: Title of the broker
- TYPE: Type of the house
- PRICE: Price of the house
- BEDS: Number of bedrooms
- BATH: Number of bathrooms
- PROPERTYSQFT: Square footage of the property
- ADDRESS: Full address of the house
- STATE: State of the house
- MAIN_ADDRESS: Main address information
- ADMINISTRATIVE_AREA_LEVEL_2: Administrative area level 2 information
- LOCALITY: Locality information
- SUBLOCALITY: Sublocality information
- STREET_NAME: Street name
- LONG_NAME: Long name
- FORMATTED_ADDRESS: Formatted address
- LATITUDE: Latitude coordinate of the house
- LONGITUDE: Longitude coordinate of the house
- Price analysis: Analyze the distribution of house prices to understand market trends and identify potential investment opportunities.
- Property size analysis: Explore the relationship between property square footage and prices to assess the value of different-sized houses.
- Location-based analysis: Investigate geographical patterns to identify areas with higher or lower property prices.
- Bedroom and bathroom trends: Analyze the impact of the number of bedrooms and bathrooms on house prices.
- Broker performance analysis: Evaluate the influence of different brokers on the pricing of houses.
If you find this dataset useful, your support through an upvote would be greatly appreciated ❤️🙂 Thank you
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average House Prices in the United States increased to 534100 USD in August from 478200 USD in July of 2025. This dataset includes a chart with historical data for the United States New Home Average Sales Price.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Key information about House Prices Growth
Facebook
TwitterThis dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.
Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.
Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.
Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.
Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.
The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.
It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.
This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
House Price Index YoY in the United States decreased to 1.70 percent in September from 2.40 percent in August of 2025. This dataset includes a chart with historical data for the United States FHFA House Price Index YoY.
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Median Sales Price of Houses Sold for the United States (MSPUS) from Q1 1963 to Q2 2025 about sales, median, housing, and USA.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I have previously shared a classification based dataset to classify the gender which is liked by those who are new to machine learning as it give a pretty good accuracy, which encouraged me to create a regression dataset to predict continues values. I have tried many real world datasets for regression problems which are predicting with lower accuracy and high error rate. As a beginner, I have struggled and worried why and how the dataset performs poorly. This is another main reason why I created this dataset. Although this is a made up dataset, I have considered all the features when deciding the price of the property. If you are a beginner, you would love to try this as the results are stunning..
Since this is a populated data, I will straightaway explain the features and the label. FEATURES 1. land_size_sqm - This the total size of the land in square meters. 2. house_size_sqm - This is the area in which house is located within the land. This is measured in square meters. 3. no_of_rooms - This indicates the number of rooms available in the house. 4. no_of_bathrooms - This shows the number of total bathrooms made in the house. 5. large_living_room - This indicates whether the house includes a larger living room or not. The assumption is that all the houses contain a living room. This feature attempts to classify whether it's large or small where '1' means large and '0' means small. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 6. parking_space - This indicates whether there is a parking space or not. '1' represents the parking available while '0' represents no parking space available. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 7. front_garden - This shows whether there is a garden available in front of the house. '1' means the garden available and '0' means no garden available. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 8. swimming_pool - This shows the availability of the swimming pool at the house. 1 represents the availability of the swimming pool while 0 represents the non availability of the same. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 9. distance_to_school_km - This shows the distance from the house to the nearest school in Kilometers. 10. wall_fence - This shows whether there is a wall fence or not. '1' mean there is wall fence and '0' means no wall fence. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 11. **house_age_or_renovated **- This is either the age of the house in years or the period from the date of renovation. 12. water_front - this indicates whether the house is located in front of the water or not. 1 means waterfront and 0 means its not located near the water. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 13. distance_to_supermarket_km - what is the distance to the nearest supermarket in kilometers.
LABEL property_value - This is the price of the property
Following features are only available in the "house price dataset original v2 cleaned" and "house price dataset original v2 with categorical features" data only. 14. crime_rate - its in float and falls between 0 and 7. lesser the better 15. room_size - As the name suggests, it explains the size of the room. 0 is being 'small', 1 is being 'medium', 2 is 'large' and 3 is being 'Extra large'. However in the categorical dataset, these values are categorical and self explanatory.
I spent around 3 hours creating this dataset. Enjoy..
Share your notebooks to see which algorithm predicts the house price precisely.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Affordability ratios calculated by dividing house prices by gross annual residence-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average House Prices in Canada increased to 688800 CAD in October from 687600 CAD in September of 2025. This dataset includes a chart with historical data for Canada Average House Prices.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Key information about House Prices Growth
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Average Sales Price of Houses Sold for the United States (ASPUS) from Q1 1963 to Q2 2025 about sales, housing, and USA.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Housing Index in the United States decreased to 435.40 points in September from 435.60 points in August of 2025. This dataset provides the latest reported value for - United States House Price Index MoM Change - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was actually made to check the correlations between a housing price index and its crime rate. Rise and fall of housing prices can be due to various factors with obvious reasons being the facilities of the house and its neighborhood. Think of a place like Detroit where there are hoodlums and you don't want to end up buying a house in the wrong place. This data set will serve as historical data for crime rate data and this in turn can be used to predict whether the housing price will rise or fall. Rise in housing price will suggest decrease in crime rate over the years and vice versa.
The headers are self explanatory. index_nsa is the housing price non seasonal index.
Thank you to my team who helped in achieving this.
https://www.kaggle.com/marshallproject/crime-rates https://catalog.data.gov/dataset/fhfa-house-price-indexes-hpis Data was collected from these 2 sources and merged to get the resulting dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This is the unadjusted lower quartile house priced for residential property sales (transactions) in the area for a 12 month period with April in the middle (year-ending September). These figures have been produced by the ONS (Office for National Statistics) using the Land Registry (LR) Price Paid data on residential dwelling transactions.
The LR Price Paid data are comprehensive in that they capture changes of ownership for individual residential properties which have sold for full market value and covers both cash sales and those involving a mortgage.
The lower quartile is the value determined by putting all the house sales for a given year, area and type in order of price and then selecting the price of the house sale which falls three quarters of the way down the list, such that 75Percentage of transactions lie above and 25Percentage lie below that value. These are particularly useful for assessing housing affordability when viewed alongside average and lower quartile income for given areas.
Note that a transaction occurs when a change of freeholder or leaseholder takes place regardless of the amount of money involved and a property can transact more than once in the time period.
The LR records the actual price for which the property changed hands. This will usually be an accurate reflection of the market value for the individual property, but it is not always the case. In order to generate statistics that more accurately reflect market values, the LR has excluded records of houses that were not sold at market value from the dataset. The remaining data are considered a good reflection of market values at the time of the transaction. For full details of exclusions and more information on the methodology used to produce these statistics please see http://www.ons.gov.uk/peoplepopulationandcommunity/housing/qmis/housepricestatisticsforsmallareasqmi
The LR Price Paid data are not adjusted to reflect the mix of houses in a given area. Fluctuations in the types of house that are sold in that area can cause differences between the lower quartile transactional value of houses and the overall market value of houses.
If, for a given year, for house type and area there were fewer than 5 sales records in the LR Price Paid data, the house price statistics are not reported." Data is Powered by LG Inform Plus and automatically checked for new data on the 3rd of each month.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Lower quartile price paid for residential property in England and Wales, by property type and administrative geographies. Annual data.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">
A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?
Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.