8 datasets found
  1. Zillow Dataset

    • kaggle.com
    Updated Nov 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phillip Zhang (2019). Zillow Dataset [Dataset]. https://www.kaggle.com/datasets/phillipzhang/zillow-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 18, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Phillip Zhang
    Description

    Dataset

    This dataset was created by Phillip Zhang

    Contents

    Zillow data in different states and times

  2. Zillow House Price Data

    • kaggle.com
    zip
    Updated Aug 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Mooney (2020). Zillow House Price Data [Dataset]. https://www.kaggle.com/paultimothymooney/zillow-house-price-data
    Explore at:
    zip(130220265 bytes)Available download formats
    Dataset updated
    Aug 11, 2020
    Authors
    Paul Mooney
    Description

    Context

    Zillow has a lot of data about housing prices in America.

    Content

    Data about housing prices and rental prices broken down according to city and state and number of bedrooms. More detail can be found at https://www.zillow.com/research/data/ and at https://www.zillow.com/research/home-sales-methodology-7733/.

    Acknowledgements

    The data was downloaded from https://www.zillow.com/research/data/. Banner photo from Ian Keefe on Unsplash. Dataset license described at https://www.zillow.com/research/data/.

  3. My Zillow Predictions

    • kaggle.com
    Updated Oct 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy Harless (2019). My Zillow Predictions [Dataset]. https://www.kaggle.com/aharless/my-zillow-predictions/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Andy Harless
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Zillow competition

    Content

    Predictions from my team's final models from the Zillow competition. One of the models (actually two, the two XGB models, which used the same code) contains a bug wherein I summed 10 different model results for 2017 and forgot to divide by 10 to get an average, so the correct values for that model can be obtained by dividing the 2017 predictions by 10.

    Acknowledgements

    Thanks to Zillow, Kaggle, my teammate Roberto Ruiz, Scirpus (creator of the genetic programming features I used), Μαριος Μιχαηλιδης (a.k.a. KazAnova, creator of StackNet), and all those whose public kernels have given me inspiration and code structures. And as long as I'm using Greek Letters, how about Homer's Muse: Μηνιν αειδε θεα Πηληιαδεω Αχιληος

    Inspiration

    Because of the bug, my submitted results will do very badly in the final rankings, but the results of the individual models may be useful. And at least they serve to document what I produced. I have no chance of making it to the second stage of the competition, but perhaps I can salvage some minor, unofficial bragging rights.

  4. Median Home Sale Price

    • kaggle.com
    zip
    Updated Feb 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kyle Thomas (2020). Median Home Sale Price [Dataset]. https://www.kaggle.com/datasets/kylet550/median-home-sale-price
    Explore at:
    zip(922052 bytes)Available download formats
    Dataset updated
    Feb 14, 2020
    Authors
    Kyle Thomas
    Description

    Context

    Zillow provides data on sold homes, including sales counts (for which there’s detailed methodology), median sale price for various housing types, and foreclosures provided as a share of all sales in which the home was previously foreclosed upon. There are current and historical listings data, ranging from median list prices and inventory counts to share of listings with a price cut, median price cut size, age of inventory, and the days a listing spent on Zillow before the sale was final. See below for full list with definitions.

    Acknowledgements

    Data courtesy of Zillow

  5. Historical House Prices across the U.S.

    • kaggle.com
    Updated Dec 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Heitor Ribeiro (2024). Historical House Prices across the U.S. [Dataset]. https://www.kaggle.com/datasets/luisheitorribeiro/historical-house-prices-across-u-s-regions/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 6, 2024
    Dataset provided by
    Kaggle
    Authors
    Luis Heitor Ribeiro
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    United States
    Description

    This dataset from Zillow contains a smoothed, seasonally adjusted measure of the typical home value and market changes across across various regions in the United States from January 2000 to August 2022. The data includes monthly observations for hundreds of cities and states, offering insights into regional housing market trends over two decades.

  6. House Price Predication

    • kaggle.com
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sheema Zain (2024). House Price Predication [Dataset]. https://www.kaggle.com/datasets/sheemazain/house-price-predication
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2024
    Dataset provided by
    Kaggle
    Authors
    Sheema Zain
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    House price prediction Predicting house prices is a common task in data science and machine learning. Here's a high-level overview of how you might approach it:

    Data Collection: Gather a dataset containing features of houses (e.g., size, number of bedrooms, location, amenities) and their corresponding prices. Websites like Zillow, Kaggle, or government housing datasets are good sources.

    Data Preprocessing: Clean the data by handling missing values, encoding categorical variables, and scaling numerical features if necessary. This step ensures that the data is in a suitable format for training a model. Feature Selection/Engineering: Choose relevant features that are likely to influence house prices. You may also create new features based on domain knowledge or data analysis.

    Model Selection: Select a regression model suitable for predicting continuous target variables like house prices. Common choices include Linear Regression, Decision Trees, Random Forests, Gradient Boosting, and Neural Networks.

    Model Training: Split your dataset into training and testing sets to train and evaluate the performance of your model. You can further split the training set for validation purposes or use cross-validation techniques.

    Model Evaluation: Assess the performance of your model using appropriate evaluation metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), or Root Mean Squared Error (RMSE).

    Hyperparameter Tuning: Fine-tune your model's hyperparameters to improve its performance. Techniques like grid search or random search can be employed for this purpose.

    Deployment: Once satisfied with your model's performance, deploy it to make predictions on new data. This could be as simple as saving the trained model and creating an interface for users to input house features.

  7. Negative Equity in U.S. Housing Market

    • kaggle.com
    Updated Jan 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Description

    Negative Equity in U.S. Housing Market

    Measuring Home Values, Debt, and Credit Risk

    By Zillow Data [source]

    About this dataset

    This dataset, Negative Equity in the US Housing Market, provides an in-depth look into the negative equity occurring across the United States during this single quarter. Included are metrics such as total amount of negative equity in millions of dollars, total number of homes in negative equity, percentage of homes with mortgages that are in negative equity and more. These data points provide helpful insights into both regional and national trends regarding the prevalence and rate of home mortgage delinquency stemming from a diminishment of value from peak levels.

    Home types available for analysis include 'all homes', condos/co-ops, multifamily units containing five or more housing units as well as duplexes/triplexes. Additionally, Cash buyers rates for particular areas can also be determined by referencing this collection. Further metrics such as mortgage affordability rates and impacts on overall indebtedness are readily calculated using information related to Zillow's Home Value Index (ZHVI) forecast methodology and TransUnion data respectively.

    Other variables featured within this dataset include characteristics like region type (i.e city, county ..etc), size rank based on population values , percentage change in ZHVI since peak levels as well as loan-to-value ratio greater than 200 across all regions constituted herein (NE). Moreover Zillow's own Secondary Mortgage Market Survey data is utilized to acquire average mortgage quote rates while correlative Census Bureau NCHS median household income figures represent typical assessable proportions between wages and debt obligations . So whether you're looking to assess effects along metro lines or detailed buffering through zip codes , this database should prove sufficient for insightful explorations! Nonetheless users must strictly adhere to all conditions encompassed within Terms Of Use commitments put forth by our lead provider before accessing any resources included herewith

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    Research Ideas

    • Analyzing regional and state trends in negative equity: Analyze geographic differences in the percentage of mortgages “underwater”, total amount of negative equity, number of homes at least 90 days late, and other key indicators to provide insight into the factors influencing negative equity across regions, states and cities.
    • Tracking the recovery rate over time: Track short-term changes in numbers related to negative equity (e.g., region or area ZHVI Change from Peak) to monitor recovery rates over time as well as how different policy interventions are affecting homeownership levels in affected areas.
    • Exploring best practices for promoting housing affordability: Compare affordability metrics (e.g., mortgage payments, price-to-income ratios) across different geographic locations over time to identify best practices for empowering homeowners and promoting stability within the housing market while reducing local inequality impacts related to availability of affordable housing options and access to credit markets like mortgages/loans etc

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: NESummary_2017Q1_Public.csv | Column name | Description | |:------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------| | RegionType | The type of region (e.g., city, county, metro etc.) (String) | | City | Name of the city (String) | | County | Name of the county (String) | | State | Name of the state (String) | | Metro ...

  8. Data from: Neighborhoods in New York

    • kaggle.com
    zip
    Updated Jul 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Cook (2017). Neighborhoods in New York [Dataset]. https://www.kaggle.com/jackcook/neighborhoods-in-new-york
    Explore at:
    zip(1069387 bytes)Available download formats
    Dataset updated
    Jul 23, 2017
    Authors
    Jack Cook
    Area covered
    New York
    Description

    Context

    This dataset contains shapefiles outlining 558 neighborhoods in 50 major cities in New York state, notably including Albany, Buffalo, Ithaca, New York City, Rochester, and Syracuse. This adds context to your datasets by identifying the neighborhood of any locations you have, as coordinates on their own don't carry a lot of information.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too. What fields does it include? What's the time period of the data and how was it collected?

    Four files are included containing data about the shapes: an SHX file, a DBF file, an SHP file, and a PRJ file. Including all of them in your input data are necessary, as they all contain pieces of the data; one file alone will not have everything that you need.

    Seeing how none of these files are plaintext, it can be a little difficult to get set up with them. I highly recommend using mapshaper.org to get started- this site will show you the boundaries drawn on a plane, as well as allow you to export the files in a number of different formats (e.g. GeoJSON, CSV) if you are unable to use them in the format they are provided in. Personally, I have found it easier to work with the shapefile format though.

    To get started with the shapefile in R, you can use the the rgdal and rgeos packages. To see an example of these being used, be sure to check out my kernel, "Incorporating neighborhoods into your model".

    Acknowledgements

    These files were provided by Zillow and are available under a Creative Commons license.

    Test

    Inspiration

    I'll be using these in the NYC Taxi Trip Duration competition to add context to the pickup and dropoff locations of the taxi rides and hopefully greatly improve my predictions.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Phillip Zhang (2019). Zillow Dataset [Dataset]. https://www.kaggle.com/datasets/phillipzhang/zillow-dataset
Organization logo

Zillow Dataset

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 18, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Phillip Zhang
Description

Dataset

This dataset was created by Phillip Zhang

Contents

Zillow data in different states and times

Search
Clear search
Close search
Google apps
Main menu