5 datasets found
  1. Zillow Home Value Index (Updated Monthly)

    • kaggle.com
    zip
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rob Mulla (2025). Zillow Home Value Index (Updated Monthly) [Dataset]. https://www.kaggle.com/datasets/robikscube/zillow-home-value-index
    Explore at:
    zip(273663 bytes)Available download formats
    Dataset updated
    Oct 21, 2025
    Authors
    Rob Mulla
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Reference: https://www.zillow.com/research/zhvi-methodology/

    Official Background

    In setting out to create a new home price index, a major problem Zillow sought to overcome in existing indices was their inability to deal with the changing composition of properties sold in one time period versus another time period. Both a median sale price index and a repeat sales index are vulnerable to such biases (see the analysis here for an example of how influential the bias can be). For example, if expensive homes sell at a disproportionately higher rate than less expensive homes in one time period, a median sale price index will characterize this market as experiencing price appreciation relative to the prior period of time even if the true value of homes is unchanged between the two periods.

    The ideal home price index would be based off sale prices for the same set of homes in each time period so there was never an issue of the sales mix being different across periods. This approach of using a constant basket of goods is widely used, common examples being a commodity price index and a consumer price index. Unfortunately, unlike commodities and consumer goods, for which we can observe prices in all time periods, we can’t observe prices on the same set of homes in all time periods because not all homes are sold in every time period.

    The innovation that Zillow developed in 2005 was a way of approximating this ideal home price index by leveraging the valuations Zillow creates on all homes (called Zestimates). Instead of actual sale prices on every home, the index is created from estimated sale prices on every home. While there is some estimation error associated with each estimated sale price (which we report here), this error is just as likely to be above the actual sale price of a home as below (in statistical terms, this is referred to as minimal systematic error). Because of this fact, the distribution of actual sale prices for homes sold in a given time period looks very similar to the distribution of estimated sale prices for this same set of homes. But, importantly, Zillow has estimated sale prices not just for the homes that sold, but for all homes even if they didn’t sell in that time period. From this data, a comprehensive and robust benchmark of home value trends can be computed which is immune to the changing mix of properties that sell in different periods of time (see Dorsey et al. (2010) for another recent discussion of this approach).

    For an in-depth comparison of the Zillow Home Value Index to the Case Shiller Home Price Index, please refer to the Zillow Home Value Index Comparison to Case-Shiller

    Each Zillow Home Value Index (ZHVI) is a time series tracking the monthly median home value in a particular geographical region. In general, each ZHVI time series begins in April 1996. We generate the ZHVI at seven geographic levels: neighborhood, ZIP code, city, congressional district, county, metropolitan area, state and the nation.

    Underlying Data

    Estimated sale prices (Zestimates) are computed based on proprietary statistical and machine learning models. These models begin the estimation process by subdividing all of the homes in United States into micro-regions, or subsets of homes either near one another or similar in physical attributes to one another. Within each micro-region, the models observe recent sale transactions and learn the relative contribution of various home attributes in predicting the sale price. These home attributes include physical facts about the home and land, prior sale transactions, tax assessment information and geographic location. Based on the patterns learned, these models can then estimate sale prices on homes that have not yet sold.

    The sale transactions from which the models learn patterns include all full-value, arms-length sales that are not foreclosure resales. The purpose of the Zestimate is to give consumers an indication of the fair value of a home under the assumption that it is sold as a conventional, non-foreclosure sale. Similarly, the purpose of the Zillow Home Value Index is to give consumers insight into the home value trends for homes that are not being sold out of foreclosure status. Zillow research indicates that homes sold as foreclosures have typical discounts relative to non-foreclosure sales of between 20 and 40 percent, depending on the foreclosure saturation of the market. This is not to say that the Zestimate is not influenced by foreclosure resales. Zestimates are, in fact, influenced by foreclosure sales, but the pathway of this influence is through the downward pressure foreclosure sales put on non-foreclosure sale prices. It is the price signal observed in the latter that we are attempting to measure and, in turn, predict with the Zestimate.

    Market Segments Within each region, we calculate the ZHVI for various subsets of homes (or mar...

  2. House Price Prediction Treated Dataset

    • kaggle.com
    zip
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinicius Araujo (2024). House Price Prediction Treated Dataset [Dataset]. https://www.kaggle.com/datasets/aravinii/house-price-prediction-treated-dataset
    Explore at:
    zip(286105 bytes)Available download formats
    Dataset updated
    Oct 22, 2024
    Authors
    Vinicius Araujo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PLEASE UPVOTE IF YOU LIKE THIS CONTENT! 😍

    Same dataset as "House Sales in King County, USA", but with treated content and with a split version (train-test) allowing direct use in machine learning models.

    We have 14 columns in the dataset, as it follows:

    • date: Date of the home sale
    • price: Price of each home sold
    • bedrooms: Number of bedrooms
    • bathrooms: Number of bathrooms
    • living_in_m2: Square meters of the apartments interior living space
    • nice_view: A flag that indicates the view's quality of a property
    • perfect_condition: A flag that indicates the maximum index of the apartment condition
    • grade: An index from 1 to 5, where 1 falls short of quality level and 5 have a high quality level of construction and design
    • has_basement: A flag indicating whether or not a property has a basement
    • renovated: A flag if the property was renovated
    • has_lavatory: Check for the presence of these incomplete/secondary bathrooms (bathtub, sink, toilet)
    • single_floor: A flag indicating whether the property had only one floor
    • month: The month of the home sale
    • quartile_zone: A quartile distribution index of the most expensive zip codes, where 1 means less expansive and 4 most expansive.
  3. USA House Prices

    • kaggle.com
    zip
    Updated Jul 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fırat Özcan (2024). USA House Prices [Dataset]. https://www.kaggle.com/datasets/fratzcan/usa-house-prices/code
    Explore at:
    zip(121422 bytes)Available download formats
    Dataset updated
    Jul 21, 2024
    Authors
    Fırat Özcan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    United States
    Description

    Real estate markets are of great importance for both local and international investors. Sydney and Melbourne are two dynamic markets where economic and social factors have significant impacts on property prices. Below is a detailed description of each feature:

    1. Date: The date when the property was sold. This feature helps in understanding the temporal trends in property prices.
    2. Price:The sale price of the property in USD. This is the target variable we aim to predict.
    3. Bedrooms:The number of bedrooms in the property. Generally, properties with more bedrooms tend to have higher prices.
    4. Bathrooms: The number of bathrooms in the property. Similar to bedrooms, more bathrooms can increase a property’s value.
    5. Sqft Living: The size of the living area in square feet. Larger living areas are typically associated with higher property values.
    6. Sqft Lot:The size of the lot in square feet. Larger lots may increase a property’s desirability and value.
    7. Floors: The number of floors in the property. Properties with multiple floors may offer more living space and appeal.
    8. Waterfront: A binary indicator (1 if the property has a waterfront view, 0 other- wise). Properties with waterfront views are often valued higher.
    9. View: An index from 0 to 4 indicating the quality of the property’s view. Better views are likely to enhance a property’s value.
    10. Condition: An index from 1 to 5 rating the condition of the property. Properties in better condition are typically worth more.
    11. Sqft Above: The square footage of the property above the basement. This can help isolate the value contribution of above-ground space.
    12. Sqft Basement: The square footage of the basement. Basements may add value depending on their usability.
    13. Yr Built: The year the property was built. Older properties may have historical value, while newer ones may offer modern amenities.
    14. Yr Renovated: The year the property was last renovated. Recent renovations can increase a property’s appeal and value.
    15. Street: The street address of the property. This feature can be used to analyze location-specific price trends.
    16. City: The city where the property is located. Different cities have distinct market dynamics.
    17. Statezip: The state and zip code of the property. This feature provides regional context for the property.
    18. Country: The country where the property is located. While this dataset focuses on properties in Australia, this feature is included for completeness.

    If you like this dataset, please contribute by upvoting

  4. house_data

    • kaggle.com
    Updated Jul 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arathi P Raj (2022). house_data [Dataset]. https://www.kaggle.com/datasets/arathipraj/house-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arathi P Raj
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Content

    The dataset consists of Price of Houses in King County , Washington from sales between May 2014 and May 2015. Along with house price it consists of information on 18 house features, date of sale and ID of sale.

    Attribute information

    1. id - Unique id for each home sold
    2. date - Date of the home saled
    3. price - Price of each home sold
    4. bedrooms - Number of bedrooms
    5. bathrooms - Number of bathrooms
    6. sqft _ living - Square footage of the apartments interior living space
    7. sqft _ lot - Square footage of the land space
    8. floors - Number of floors
    9. waterfront - A dummy variable for whether the apartment was overlooking the waterfront or not
    10. view - An index from 0 to 4 of how good the view of the property was
    11. condition - an index from 1 to 5 on the condition of the apartment
    12. grade - An index from 1 to 13 , where 1-3falls short of building construction and design, 7 has an average level of construction and design , and 11-13 have a high quality level of construction and design
    13. sqft _ above - the square footage of the interior housing space that is above ground level
    14. sqft _ basement - the square footage of the inerior housing space that is below ground level
    15. yr _ built - The year of the house was initially built
    16. yr _ renovated - The year of the house's last renovation
    17. zipcode - What zipcode area the house is in
    18. lat - Lattitude
    19. long - Longitude
    20. sqft _ living15 - The square footage of inerior housing living space for the nearest nearest 15 neighbours
    21. sqft _ lot15 - the square footage of the land lots of the nearest 15 neighbours
  5. a

    Assisted living costs in Philadelphia, PA, over time

    • aplaceformom.com
    html
    Updated Dec 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Assisted living costs in Philadelphia, PA, over time [Dataset]. https://www.aplaceformom.com/assisted-living/pennsylvania/philadelphia
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Dec 14, 2023
    Area covered
    Philadelphia, Pennsylvania
    Description

    Cost comparison table showing 2023 and 2024 median costs by location

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rob Mulla (2025). Zillow Home Value Index (Updated Monthly) [Dataset]. https://www.kaggle.com/datasets/robikscube/zillow-home-value-index
Organization logo

Zillow Home Value Index (Updated Monthly)

Updated Monthly Pulled from FRED Api

Explore at:
zip(273663 bytes)Available download formats
Dataset updated
Oct 21, 2025
Authors
Rob Mulla
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Reference: https://www.zillow.com/research/zhvi-methodology/

Official Background

In setting out to create a new home price index, a major problem Zillow sought to overcome in existing indices was their inability to deal with the changing composition of properties sold in one time period versus another time period. Both a median sale price index and a repeat sales index are vulnerable to such biases (see the analysis here for an example of how influential the bias can be). For example, if expensive homes sell at a disproportionately higher rate than less expensive homes in one time period, a median sale price index will characterize this market as experiencing price appreciation relative to the prior period of time even if the true value of homes is unchanged between the two periods.

The ideal home price index would be based off sale prices for the same set of homes in each time period so there was never an issue of the sales mix being different across periods. This approach of using a constant basket of goods is widely used, common examples being a commodity price index and a consumer price index. Unfortunately, unlike commodities and consumer goods, for which we can observe prices in all time periods, we can’t observe prices on the same set of homes in all time periods because not all homes are sold in every time period.

The innovation that Zillow developed in 2005 was a way of approximating this ideal home price index by leveraging the valuations Zillow creates on all homes (called Zestimates). Instead of actual sale prices on every home, the index is created from estimated sale prices on every home. While there is some estimation error associated with each estimated sale price (which we report here), this error is just as likely to be above the actual sale price of a home as below (in statistical terms, this is referred to as minimal systematic error). Because of this fact, the distribution of actual sale prices for homes sold in a given time period looks very similar to the distribution of estimated sale prices for this same set of homes. But, importantly, Zillow has estimated sale prices not just for the homes that sold, but for all homes even if they didn’t sell in that time period. From this data, a comprehensive and robust benchmark of home value trends can be computed which is immune to the changing mix of properties that sell in different periods of time (see Dorsey et al. (2010) for another recent discussion of this approach).

For an in-depth comparison of the Zillow Home Value Index to the Case Shiller Home Price Index, please refer to the Zillow Home Value Index Comparison to Case-Shiller

Each Zillow Home Value Index (ZHVI) is a time series tracking the monthly median home value in a particular geographical region. In general, each ZHVI time series begins in April 1996. We generate the ZHVI at seven geographic levels: neighborhood, ZIP code, city, congressional district, county, metropolitan area, state and the nation.

Underlying Data

Estimated sale prices (Zestimates) are computed based on proprietary statistical and machine learning models. These models begin the estimation process by subdividing all of the homes in United States into micro-regions, or subsets of homes either near one another or similar in physical attributes to one another. Within each micro-region, the models observe recent sale transactions and learn the relative contribution of various home attributes in predicting the sale price. These home attributes include physical facts about the home and land, prior sale transactions, tax assessment information and geographic location. Based on the patterns learned, these models can then estimate sale prices on homes that have not yet sold.

The sale transactions from which the models learn patterns include all full-value, arms-length sales that are not foreclosure resales. The purpose of the Zestimate is to give consumers an indication of the fair value of a home under the assumption that it is sold as a conventional, non-foreclosure sale. Similarly, the purpose of the Zillow Home Value Index is to give consumers insight into the home value trends for homes that are not being sold out of foreclosure status. Zillow research indicates that homes sold as foreclosures have typical discounts relative to non-foreclosure sales of between 20 and 40 percent, depending on the foreclosure saturation of the market. This is not to say that the Zestimate is not influenced by foreclosure resales. Zestimates are, in fact, influenced by foreclosure sales, but the pathway of this influence is through the downward pressure foreclosure sales put on non-foreclosure sale prices. It is the price signal observed in the latter that we are attempting to measure and, in turn, predict with the Zestimate.

Market Segments Within each region, we calculate the ZHVI for various subsets of homes (or mar...

Search
Clear search
Close search
Google apps
Main menu