As a Data scientist, who yearns to experiment, learn and explore different techniques applied in this field, one cannot overlook the importance of application of Exploratory Data Analysis on various datasets out there.
This housing dataset provides a thorough analysis of the current state of the housing market. It includes information on housing prices, availability, and key trends, allowing you to gain a better understanding of the market and make informed decisions. Whether you're a homebuyer, investor, or simply interested in the state of the housing market, this dataset has valuable insights to offer.
Data are updated semiannually, at the end of the second and fourth quarters of each year.
Please see DCP’s annual Housing Production Snapshot summarizing findings from the 21Q4 data release here. Additional Housing and Economic analyses are also available.
The NYC Department of City Planning’s (DCP) Housing Database Unit Change Summary Files provide the net change in Class A housing units since 2010, and the count of units pending completion for commonly used political and statistical boundaries (Census Block, Census Tract, City Council district, Community District, Community District Tabulation Area (CDTA), Neighborhood Tabulation Area (NTA). These tables are aggregated from the DCP Housing Database Project-Level Files, which is derived from Department of Buildings (DOB) approved housing construction and demolition jobs filed or completed in NYC since January 1, 2010. Net housing unit change is calculated as the sum of all three construction job types that add or remove residential units: new buildings, major alterations, and demolitions. These files can be used to determine the change in legal housing units across time and space.
The dataset contains current data on low rent and Section 8 units in PHA's administered by HUD. The Section 8 Rental Voucher Program increases affordable housing choices for very low-income households by allowing families to choose privately owned rental housing. Through the Section 8 Rental Voucher Program, the administering housing authority issues a voucher to an income-qualified household, which then finds a unit to rent. If the unit meets the Section 8 quality standards, the PHA then pays the landlord the amount equal to the difference between 30 percent of the tenant's adjusted income (or 10 percent of the gross income or the portion of welfare assistance designated for housing) and the PHA-determined payment standard for the area. The rent must be reasonable compared with similar unassisted units.
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for All-Transactions House Price Index for the United States (USSTHPI) from Q1 1975 to Q2 2025 about appraisers, HPI, housing, price index, indexes, price, and USA.
https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required
Graph and download economic data for Housing Inventory: Median Days on Market in the United States (MEDDAYONMARUS) from Jul 2016 to Sep 2025 about median and USA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview: This dataset was collected and curated to support research on predicting real estate prices using machine learning algorithms, specifically Support Vector Regression (SVR) and Gradient Boosting Machine (GBM). The dataset includes comprehensive information on residential properties, enabling the development and evaluation of predictive models for accurate and transparent real estate appraisals.Data Source: The data was sourced from Department of Lands and Survey real estate listings.Features: The dataset contains the following key attributes for each property:Area (in square meters): The total living area of the property.Floor Number: The floor on which the property is located.Location: Geographic coordinates or city/region where the property is situated.Type of Apartment: The classification of the property, such as studio, one-bedroom, two-bedroom, etc.Number of Bathrooms: The total number of bathrooms in the property.Number of Bedrooms: The total number of bedrooms in the property.Property Age (in years): The number of years since the property was constructed.Property Condition: A categorical variable indicating the condition of the property (e.g., new, good, fair, needs renovation).Proximity to Amenities: The distance to nearby amenities such as schools, hospitals, shopping centers, and public transportation.Market Price (target variable): The actual sale price or listed price of the property.Data Preprocessing:Normalization: Numeric features such as area and proximity to amenities were normalized to ensure consistency and improve model performance.Categorical Encoding: Categorical features like property condition and type of apartment were encoded using one-hot encoding or label encoding, depending on the specific model requirements.Missing Values: Missing data points were handled using appropriate imputation techniques or by excluding records with significant missing information.Usage: This dataset was utilized to train and test machine learning models, aiming to predict the market price of residential properties based on the provided attributes. The models developed using this dataset demonstrated improved accuracy and transparency over traditional appraisal methods.Dataset Availability: The dataset is available for public use under the [CC BY 4.0]. Users are encouraged to cite the related publication when using the data in their research or applications.Citation: If you use this dataset in your research, please cite the following publication:[Real Estate Decision-Making: Precision in Price Prediction through Advanced Machine Learning Algorithms].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
House Price Index YoY in the United States decreased to 2.30 percent in July from 2.70 percent in June of 2025. This dataset includes a chart with historical data for the United States FHFA House Price Index YoY.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
So this data set is collected for completing a college project ,which is an android app for calculating the price of houses.
This data is scraped from magic bricks website between june 2021 and july 2021 .
magicbricks.com
With the help of the data available one can make a regression model to predict house prices.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Housing Starts in the United States decreased to 1307 Thousand units in August from 1429 Thousand units in July of 2025. This dataset provides the latest reported value for - United States Housing Starts - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In 2021, Allegheny County Economic Development (ACED), in partnership with Urban Redevelopment Authority of Pittsburgh(URA), completed the a Market Value Analysis (MVA) for Allegheny County. This analysis services as both an update to previous MVA’s commissioned separately by ACED and the URA and combines the MVA for the whole of Allegheny County (inclusive of the City of Pittsburgh). The MVA is a unique tool for characterizing markets because it creates an internally referenced index of a municipality’s residential real estate market. It identifies areas that are the highest demand markets as well as areas of greatest distress, and the various markets types between. The MVA offers insight into the variation in market strength and weakness within and between traditional community boundaries because it uses Census block groups as the unit of analysis. Where market types abut each other on the map becomes instructive about the potential direction of market change, and ultimately, the appropriateness of types of investment or intervention strategies.
This MVA utilized data that helps to define the local real estate market. The data used covers the 2017-2019 period, and data used in the analysis includes:
The MVA uses a statistical technique known as cluster analysis, forming groups of areas (i.e., block groups) that are similar along the MVA descriptors, noted above. The goal is to form groups within which there is a similarity of characteristics within each group, but each group itself different from the others. Using this technique, the MVA condenses vast amounts of data for the universe of all properties to a manageable, meaningful typology of market types that can inform area-appropriate programs and decisions regarding the allocation of resources.
Please refer to the presentation and executive summary for more information about the data, methodology, and findings.
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
This dataset provides information on Social Housing Asset Data at Salford City Council. Details are provided to meet the required standards of the Local Government Transparency Code 2014.
https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required
Graph and download economic data for Real Residential Property Prices for United States (QUSR628BIS) from Q1 1970 to Q2 2025 about residential, HPI, housing, real, price index, indexes, price, and USA.
The U.S. Department of Housing and Urban Development (HUD) periodically receives custom tabulations of data from the U.S. Census Bureau that are largely not available through standard Census products. These data, known as the CHAS data (Comprehensive Housing Affordability Strategy), demonstrate the extent of housing problems and housing needs, particularly for low income households. The CHAS data are used by local governments to plan how to spend HUD funds, and may also be used by HUD to distribute grant funds
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Phoenix housing data from the American Community Survey (ACS) 1-year estimates
https://www.icpsr.umich.edu/web/ICPSR/studies/25204/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/25204/terms
The Housing Affordability Data System (HADS) is a set of housing unit level datasets that measures the affordability of housing units and the housing cost burdens of households, relative to area median incomes, poverty level incomes, and Fair Market Rents. The purpose of these datasets is to provide housing analysts with consistent measures of affordability and burdens over a long period. The datasets are based on the American Housing Survey (AHS) national files from 1985 through 2005 and the metropolitan files for 2002 and 2004. Users can link records in HADS files to AHS records, allowing access to all of the AHS variables. Housing-level variables include information on the number of rooms in the housing unit, the year the unit was built, whether it was occupied or vacant, whether the unit was rented or owned, whether it was a single family or multiunit structure, the number of units in the building, the current market value of the unit, and measures of relative housing costs. The dataset also includes variables describing the number of people living in the household, household income, and the type of residential area (e.g., urban or suburban).
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
***********************NOTICE******************************** This dataset is incomplete and in the process of being updated. Please contact david.cruz@austintexas.gov with any questions.
This dataset contains all income-restricted housing within the Austin Full Purpose and into the 5-mile Extra Territorial Jurisdiction. This includes properties funded by the City of Austin along with the Housing Authority City of Austin, Housing Authority of Travis County, and Texas Department of Housing and Community Affairs. Some properties may be funded by more than one entity. The property attributes are intended to help Austin residents find income-restricted housing that best suits their needs.
The dataset is connected to the affordable housing data hub which is consistently updated with the most current property information. A Feature Manipulation Engine Script pulls a new dataset to the Open Data Portal on a daily basis.
New housing price index (NHPI). Monthly data are available from January 1981. The table presents data for the most recent reference period and the last four periods. The base period for the index is (201612=100).
https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required
Graph and download economic data for Housing Inventory: Price Reduced Count in the United States (PRIREDCOUUS) from Jul 2016 to Sep 2025 about reduced count, price, and USA.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This tool is a searchable data catalogue containing links to a range of official statistics on housing. It forms a part of the ONS Housing Statistics Portal.
This is the dataset used in this book: https://github.com/ageron/handson-ml/tree/master/datasets/housing to illustrate a sample end-to-end ML project workflow (pipeline). This is a great book - I highly recommend!
The data is based on California Census in 1990.
"This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.
The following is the description from the book author:
This dataset appeared in a 1997 paper titled Sparse Spatial Autoregressions by Pace, R. Kelley and Ronald Barry, published in the Statistics and Probability Letters journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).
The dataset in this directory is almost identical to the original, with two differences: 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing."
http://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html
This is a dataset obtained from the StatLib repository. Here is the included description:
"We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value)."
As a Data scientist, who yearns to experiment, learn and explore different techniques applied in this field, one cannot overlook the importance of application of Exploratory Data Analysis on various datasets out there.
This housing dataset provides a thorough analysis of the current state of the housing market. It includes information on housing prices, availability, and key trends, allowing you to gain a better understanding of the market and make informed decisions. Whether you're a homebuyer, investor, or simply interested in the state of the housing market, this dataset has valuable insights to offer.