http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset consists of Price of Houses in King County , Washington from sales between May 2014 and May 2015. Along with house price it consists of information on 18 house features, date of sale and ID of sale.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Housing Starts in the United States decreased to 1307 Thousand units in August from 1429 Thousand units in July of 2025. This dataset provides the latest reported value for - United States Housing Starts - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Existing Home Sales in the United States decreased to 4000 Thousand in August from 4010 Thousand in July of 2025. This dataset provides the latest reported value for - United States Existing Home Sales - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Average house prices are derived from data supplied by the mortgage lending agencies on loans approved by them rather than loans paid. In comparing house prices figures from one period to another, account should be taken of the fact that changes in the mix of houses (incl apartments) will affect the average figures. The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change. Excluding apartments, measured in € Figure changed on the 27/6/16 as revised data received from the Local authority
The NYC Department of City Planning’s (DCP) Housing Database contains all NYC Department of Buildings (DOB) approved housing construction and demolition jobs filed or completed in NYC since January 1, 2010. It includes the three primary construction job types that add or remove residential units: new buildings, major alterations, and demolitions, and can be used to determine the change in legal housing units across time and space. Records in the Housing Database Project-Level Files are geocoded to the greatest level of precision possible, subject to numerous quality assurance and control checks, recoded for usability, and joined to other housing data sources relevant to city planners and analysts. Data are updated semiannually, at the end of the second and fourth quarters of each year. Please see DCP’s annual Housing Production Snapshot summarizing findings from the 21Q4 data release here. Additional Housing and Economic analyses are also available. The NYC Department of City Planning’s (DCP) Housing Database Unit Change Summary Files provide the net change in Class A housing units since 2010, and the count of units pending completion for commonly used political and statistical boundaries (Census Block, Census Tract, City Council district, Community District, Community District Tabulation Area (CDTA), Neighborhood Tabulation Area (NTA). These tables are aggregated from the DCP Housing Database Project-Level Files, which is derived from Department of Buildings (DOB) approved housing construction and demolition jobs filed or completed in NYC since January 1, 2010. Net housing unit change is calculated as the sum of all three construction job types that add or remove residential units: new buildings, major alterations, and demolitions. These files can be used to determine the change in legal housing units across time and space.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Inspired by the quintessential House Prices Starter Competition and the popular Melbourne Housing Dataset, this dataset captures 4K+ condominium unit listings on the Malaysian housing website mudah.my.
Like the above datasets, your job is to predict the house prices given certain parameters.
The data was scraped directly from the website using this data collection notebook. I might adapt the code to include houses as well in the future, but scraping the data takes a while due to having to wait for the website to load and having to timeout to account for CloudFlare's protections.
Note: This data is a lot less clean and organized than the data in the two datasets mentioned above. However, this is a good opportunity to practice data cleaning techniques, as this is something that is often overlooked on Kaggle. That being said, I made a starter notebook that goes through the data cleaning steps and outputs a fairly cleaned version of the dataset.
description
: The full (unfiltered) description for the unit listing.Ad List
: The ID of the listing on the website.Category
: The category of the listing. It will most likely be Apartment / Condominium
.Facilities
: The facilities that the apartment has, in a comma-separated list.Building Name
: The name of the building.Developer
: The developer for the building.Tenure Type
: The type of tenure for the building.Address
: The address of the building. You can refer to this link for a description of what Malaysian addresses look like.Completion Year
: The completion year of the building. If the building is still under construction, this is listed as -
.# of Floors
: The number of floors in the building.Total Units
: The total number of units in the building.Property Type
: The type of property.Bedroom
: The number of bedrooms in the unit.Bathroom
: The number of bathrooms in the unit.Parking Lot
: The number of parking lots assigned to the unit, if any.Floor Range
: The floor range for the building.Property Size
: The size of the unit.Land Title
: The title given to the land. This link explains what land titles are.Firm Type
: The type of firm who posted the listing.Firm Number
: The ID of the firm who posted the listing.REN Number
: The REN number of the firm who posted the listing. Refer to this link for what REN numbers are.price
: The price of the unit. This is what you are trying to predict.Nearby School/School
: If there is a nearby school to the unit, which school it is.Park
: If there is a nearby park to the unit, which park it is.Nearby Railway Station
: If there is a nearby railway station to the unit, which railway station it is.Bus Stop
: If there is a nearby bus stop to the unit, which station it is.Nearby Mall/Mall
: If there is a nearby mall to the unit, which mall it is.Highway
: If there is a nearby highway to the unit, which highway it is.Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This table shows the average purchase price that has been paid in the reporting period for existing own homes purchased by a private individual. The average purchase price of existing own homes may differ from the price index of existing own homes. The average purchase price is no indicator for price developments of owner-occupied residential property. The average purchase price reflects the average price of dwellings sold in a particular period. The fact that de dwellings sold differs from one period to another is not taken into account. The following instance explains which problems are entailed by the continually changing of the quality of the dwellings sold. Suppose in February of a particular year mainly big houses with extensive gardens beautifully situated alongside canals are sold, whereas in March many small terraced houses are sold. In that case the average purchase price in February will be higher than in March but this does not mean that house prices are increased. See note 3 for a link to the article 'Why the average purchase price is not an indicator'.
Data available from: 1995
Status of the figures: The figures in this table are immediately definitive. The calculation of these figures is based on the number of notary transactions that are registered every month by the Dutch Land Registry Office (Kadaster). A revision of the figures is exceptional and occurs specifically if an error significantly exceeds the acceptable statistical margins. The average purchasing prices of existing owner-occupied sold homes can be calculated by Kadaster at a later date. These figures are usually the same as the publication on Statline, but in some periods they differ. Kadaster calculates the average purchasing prices based on the most recent data. These may have changed since the first publication. Statistics Netherlands uses figures from the first publication in accordance with the revision policy described above.
Changes as of 17 February 2025: Added average purchase prices of the municipalities for the year 2024.
When will new figures be published? New figures are published approximately one to three months after the period under review.
Dataset on Housing Prices in the Philippines, scraped from from Lamudi on May 2023.
https://brightdata.com/licensehttps://brightdata.com/license
Real estate datasets from various websites cover all major real estate data points including: property type, size, location, price, bedrooms, baths, address, history, images, and much more. Popular use cases include: forecast housing demand, analyze price fluctuations, improve customer satisfaction, see past prices to monitor market trends, and more.
https://brightdata.com/licensehttps://brightdata.com/license
Enrich your real estate strategies and market insights with our comprehensive Seattle housing dataset. Analyzing this dataset can aid in understanding housing market dynamics and trends, empowering organizations to refine their investment strategies and business decisions. Access the entire dataset or tailor a subset to fit your requirements.
Popular use cases include optimizing investment strategies based on neighborhood engagement and property popularity, performing detailed user behavior analysis and segmentation by housing type, price range, and location to tailor marketing and engagement efforts, and identifying and forecasting emerging trends in the Seattle housing market to stay ahead in the competitive real estate industry.
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
This dataset contains material volumes (m3), material masses (kg), and material intensities (kg/m2) for representative buildings from 45 residential building cohorts in Finland. These data are presented per material and in total, aggregated on three hierarchical levels on the correspondingly named sheets in the OpenDocument Spreadsheet (ODS) file: the entire building (data_building), distinguished between vertical building levels (data_building_level), and distinguished between building parts (data_building part). Further details on the data are provided on the description sheet in the same file.
The cohorts are based on building type, main frame material, main façade material, and construction decade. Each cohort is represented by one inventoried building, covering the combinations in the tables below. All buildings are located in the city of Vantaa, Finland, with the exception of the 1940s and 1950s houses, which are based on type-planned houses and thus have no specific location.
Frame material, Facade material | 1940s | 1950s | 1960s | 1970s | 1980s | 1990s | 2000s | 2010s |
---|---|---|---|---|---|---|---|---|
Wood, Wood | D | D | D | T | D | D | D | D |
Wood, Brick | T | T | D | T | T | T | T | |
Brick, Brick | D | D | D* | D | D | D | ||
Concrete, Concrete | D | T | D | D | D | D | ||
Concrete, Brick | T | D | T | T | T | T |
Frame material, Facade material | 1940s | 1950s | 1960s | 1970s | 1980s | 1990s | 2000s | 2010s |
---|---|---|---|---|---|---|---|---|
Concrete, Concrete | D | D | D | D | D | D | ||
Concrete, Brick | T | T | T | T | T | T |
D = Direct record based on construction documents.
T = Theoretical variant with alternative façade material based on typical contemporary construction practice. All properties except the cladding and any related materials (e.g. battens) are identical with the corresponding direct record.
* Geometry determined based on construction documents from 1979 due to lack of suitably sized cases dated in the 1980s. Insulation thicknesses adjusted to match the represented decade’s building code.
The data are primarily based on digitized construction documents obtained from the Vantaa building inspection authority’s archives through the purchase portal Lupapiste Kauppa (https://kauppa.lupapiste.fi/). The 1940s and 1950s type-planned houses’ drawings are from the National Archives of Finland.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was generated for analyzing the economic impacts of subway networks on housing prices in metropolitan areas. The provision of transit networks and accompanying improvement in accessibility induce various impacts and we focused on the economic impacts realized through housing prices. As a proxy of housing price, we consider the price of condominiums, the dominant housing type in South Korea. Although our focus is transit accessibility and housing prices, the presented dataset is applicable to other studies. In particular, it provides a wide range of variables closely related to housing price, including housing properties, local amenities, local demographic characteristics, and control variables for the seasonality. Many of these variables were scientifically generated by our research team. Various distance variables were constructed in a geographic information system environment based on public data and they are useful not only for exploring environmental impacts on housing prices, but also for other statistical analyses in regard to real estate and social science research. The four metropolitan areas covered by the data—Busan, Daegu, Daejeon, and Gwangju—are independent of the transit systems of Greater Seoul, providing accurate information on the metropolitan structure separate from the capital city.
Local authorities compiling this data or other interested parties may wish to see notes and definitions for house building which includes P2 full guidance notes.
Data from live tables 253 and 253a is also published as http://opendatacommunities.org/def/concept/folders/themes/house-building">Open Data (linked data format).
<p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute"><abbr title="OpenDocument Spreadsheet" class="gem-c-attachment_abbr">ODS</abbr></span>, <span class="gem-c-attachment_attribute">26.6 KB</span></p>
<p class="gem-c-attachment_metadata">
This file is in an <a href="https://www.gov.uk/guidance/using-open-document-formats-odf-in-your-organisation" target="_self" class="govuk-link">OpenDocument</a> format
<p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute"><abbr title="OpenDocument Spreadsheet" class="gem-c-attachment_abbr">ODS</abbr></span>, <span class="gem-c-attachment_attribute">109 KB</span></p>
<p class="gem-c-attachment_metadata">
This file is in an <a href="https://www.gov.uk/guidance/using-open-document-formats-odf-in-your-organisation" target="_self" class="govuk-link">OpenDocument</a> format
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Total Housing Inventory in the United States decreased to 1530 Thousands in August from 1550 Thousands in July of 2025. This dataset includes a chart with historical data for the United States Total Housing Inventory.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Older housing can impact the quality of the occupant's health in a number of ways, including lead exposure, housing quality, and factors that may exacerbate respiratory conditions, like asthma. Data from the U.S. Census Bureau contains Census Tract estimates of housing age, and Allegheny County assessment data provides parcel-level information on the year residential properties were built.
Support for Health Equity datasets and tools provided by Amazon Web Services (AWS) through their Health Equity Initiative.
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Median Sales Price of Houses Sold for the United States (MSPUS) from Q1 1963 to Q2 2025 about sales, median, housing, and USA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average house prices are derived from data supplied by the mortgage lending agencies on loans approved by them rather than loans paid. In comparing house prices figures from one period to another, account should be taken of the fact that changes in the mix of houses (incl apartments) will affect the average figures. The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change. Excluding apartments, measured in EUR Figure changed on the 27/6/16 as revised data received from the Local authority
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Development of the National Register for Social Housing (NROSH) was started by the Department for Communities and Local Government (DCLG) in 2004. NROSH aimed to be a database of all social housing properties in England, with a range of details captured on each property. NROSH was transferred to the Tenant Services Authority, the social housing regulator, in April 2010 and was discontinued in May 2011. Ownership of the latest NROSH dataset passed from the TSA to the Homes and Communities Agency (HCA) when responsibility for social housing regulation passed to the Regulation Committee of the HCA in April 2012. In addition to being out of date, the records submitted by social landlords to NROSH are of varying quantity and quality with many incomplete, inaccurate or missing records. The database may also contain a number of duplicate entries. Two datasets are available. One is the latest NROSH database held by the HCA as at May 2011. This release contains a large subset of the full NROSH dataset (48 from 201 fields in total; for 4,826,417 unique property records). The data in this release does not include those fields where data could enable specific identification of vulnerable people or other sensitive personal data. It also excludes fields where a minimum completion threshold is not met (generally fields where less than 25% of records have data). There are still issues of quality, incomplete data, and potential duplication of records in the data that accompanies this release that HCA is not able to resolve. Additional information, including data that falls below the minimum quality thresholds for this release, may be requested from the HCA (Referrals & Regulatory Enquiries Team, mail@homesandcommunities.co.uk). The 48 fields included in this release are summarised and described in the two tables accompanying this metadata. The data is contained in five compressed single CSV files: NROSH Data Extract Part 1; - 2; - 3; -4 and -5. Due to the large volume of records, analysis will require database software (MS Excel will not support analysis). Also available is a snapshot of the NROSH database held by DCLG as at March 2010. The data is that which was reported by social landlords in line with the system specifications and includes a selected set of fields on property address, type of accommodation, form of structure, number of rooms and bedspaces are included.
https://dataverse.ada.edu.au/api/datasets/:persistentId/versions/3.5/customlicense?persistentId=doi:10.26193/IBL7PZhttps://dataverse.ada.edu.au/api/datasets/:persistentId/versions/3.5/customlicense?persistentId=doi:10.26193/IBL7PZ
Rental is Australia’s emerging tenure. Each year the proportion of Australians who rent increases, many of us will rent for life, and for the first time in generations there are now more renters than home owners. Though the rental sector is home to almost one-third of all Australians, researchers and policy-makers know little about conditions in this growing market because there is currently no systematic or reliable data. This project provides researchers and policy stakeholders with an essential database on Australia’s rental housing conditions. This data infrastructure will provide the knowledge base for national and international research and allow better urban, economic and social policy development. Building on The 2016 Australian Housing Conditions Dataset, in 2020 we collected data on the housing conditions of 15,000 rental households, covering all Australian states and territories. The project is funded by the Australian Research Council and The University of Adelaide, in partnership with the University of South Australia, the University of Melbourne, Swinburne University of Technology, Curtin University and Western Sydney University and is led by Professor Emma Baker at the University of Adelaide. The Australian Housing and Urban Research Institute provided funding for the focussed COVID-19 Module.
About the dataset (cleaned data)
The dataset (parquet file) contains approximately 1,5 million residential household sales from Denmark during the periode from 1992 to 2024. All cleaned data is merged into one parquet file here on Kaggle. Note some cleaning might still be nessesary, see notebook under code.
Also, added a random sample (100k) of the dataset as a csv file.
Done in Python version: 2.6.3.
Raw data
Raw data and more info is avaible on Github repositary: https://github.com/MartinSamFred/Danish-residential-housingPrices-1992-2024.git
The dataset has been scraped and cleaned (to some extent). Cleaned files are located in: \Housing_data_cleaned \ named DKHousingprices_1 and 2. Saved in parquet format (and saved as two files due to size).
Cleaning from raw files to above cleaned files is outlined in BoligsalgConcatCleanigGit.ipynb. (done in Python version: 2.6.3)
Webscraping script: Webscrape_script.ipynb (done in Python version: 2.6.3)
Provided you want to clean raw files from scratch yourself:
Uncleaned scraped files (81 in total) are located in \Housing_data_raw \ Housing_data_batch1 and 2. Saved in .csv format and compressed as 7-zip files.
Additional files added/appended to the Cleaned files are located in \Addtional_data and named DK_inflation_rates, DK_interest_rates, DK_morgage_rates and DK_regions_zip_codes. Saved in .xlsx format.
Content
Each row in the dataset contains a residential household sale during the period 1992 - 2024.
“Cleaned files” columns:
0 'date': is the transaction date
1 'quarter': is the quarter based on a standard calendar year
2 'house_id': unique house id (could be dropped)
3 'house_type': can be 'Villa', 'Farm', 'Summerhouse', 'Apartment', 'Townhouse'
4 'sales_type': can be 'regular_sale', 'family_sale', 'other_sale', 'auction', '-' (“-“ could be dropped)
5 'year_build': range 1000 to 2024 (could be narrowed more)
6 'purchase_price': is purchase price in DKK
7 '%_change_between_offer_and_purchase': could differ negatively, be zero or positive
8 'no_rooms': number of rooms
9 'sqm': number of square meters
10 'sqm_price': 'purchase_price' divided by 'sqm_price'
11 'address': is the address
12 'zip_code': is the zip code
13 'city': is the city
14 'area': 'East & mid jutland', 'North jutland', 'Other islands', 'Capital, Copenhagen', 'South jutland', 'North Zealand', 'Fyn & islands', 'Bornholm'
15 'region': 'Jutland', 'Zealand', 'Fyn & islands', 'Bornholm'
16 'nom_interest_rate%': Danish nominal interest rate show pr. quarter however actual rate is not converted from annualized to quarterly
17 'dk_ann_infl_rate%': Danish annual inflation rate show pr. quarter however actual rate is not converted from annualized to quarterly
18 'yield_on_mortgage_credit_bonds%': 30 year mortgage bond rate (without spread)
Uses
Various (statistical) analysis, visualisation and I assume machine learning as well.
Practice exercises etc.
Uncleaned scraped files are great to practice cleaning, especially string cleaning. I’m not an expect as seen in the coding ;-).
Disclaimer
The data and information in the data set provided here are intended to be used primarily for educational purposes only. I do not own any data, and all rights are reserved to the respective owners as outlined in “Acknowledgements/sources”. The accuracy of the dataset is not guaranteed accordingly any analysis and/or conclusions is solely at the user's own responsibly and accountability.
Acknowledgements/sources
All data is publicly available on:
Boliga: https://www.boliga.dk/
Finans Danmark: https://finansdanmark.dk/
Danmarks Statistik: https://www.dst.dk/da
Statistikbanken: https://statistikbanken.dk/statbank5a/default.asp?w=2560
Macrotrends: https://www.macrotrends.net/
PostNord: https://www.postnord.dk/
World Data: https://www.worlddata.info/
Dataset picture / cover photo: Nick Karvounis (https://unsplash.com/)
Have fun… :-)
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset consists of Price of Houses in King County , Washington from sales between May 2014 and May 2015. Along with house price it consists of information on 18 house features, date of sale and ID of sale.