8 datasets found

Zillow Dataset
kaggle.com
Updated Nov 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phillip Zhang (2019). Zillow Dataset [Dataset]. https://www.kaggle.com/datasets/phillipzhang/zillow-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 18, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Phillip Zhang
Description
Dataset

This dataset was created by Phillip Zhang

Contents

Zillow data in different states and times
Zillow House Price Data
kaggle.com
zip
Updated Aug 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Mooney (2020). Zillow House Price Data [Dataset]. https://www.kaggle.com/paultimothymooney/zillow-house-price-data
Explore at:
zip(130220265 bytes)Available download formats
Dataset updated
Aug 11, 2020
Authors
Paul Mooney
Description
Context

Zillow has a lot of data about housing prices in America.

Content

Data about housing prices and rental prices broken down according to city and state and number of bedrooms. More detail can be found at https://www.zillow.com/research/data/ and at https://www.zillow.com/research/home-sales-methodology-7733/.

Acknowledgements

The data was downloaded from https://www.zillow.com/research/data/. Banner photo from Ian Keefe on Unsplash. Dataset license described at https://www.zillow.com/research/data/.
My Zillow Predictions
kaggle.com
Updated Oct 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andy Harless (2019). My Zillow Predictions [Dataset]. https://www.kaggle.com/aharless/my-zillow-predictions/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Andy Harless
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Zillow competition

Content

Predictions from my team's final models from the Zillow competition. One of the models (actually two, the two XGB models, which used the same code) contains a bug wherein I summed 10 different model results for 2017 and forgot to divide by 10 to get an average, so the correct values for that model can be obtained by dividing the 2017 predictions by 10.

Acknowledgements

Thanks to Zillow, Kaggle, my teammate Roberto Ruiz, Scirpus (creator of the genetic programming features I used), Μαριος Μιχαηλιδης (a.k.a. KazAnova, creator of StackNet), and all those whose public kernels have given me inspiration and code structures. And as long as I'm using Greek Letters, how about Homer's Muse: Μηνιν αειδε θεα Πηληιαδεω Αχιληος

Inspiration

Because of the bug, my submitted results will do very badly in the final rankings, but the results of the individual models may be useful. And at least they serve to document what I produced. I have no chance of making it to the second stage of the competition, but perhaps I can salvage some minor, unofficial bragging rights.
Median Home Sale Price
kaggle.com
zip
Updated Feb 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kyle Thomas (2020). Median Home Sale Price [Dataset]. https://www.kaggle.com/datasets/kylet550/median-home-sale-price
Explore at:
zip(922052 bytes)Available download formats
Dataset updated
Feb 14, 2020
Authors
Kyle Thomas
Description
Context

Zillow provides data on sold homes, including sales counts (for which there’s detailed methodology), median sale price for various housing types, and foreclosures provided as a share of all sales in which the home was previously foreclosed upon. There are current and historical listings data, ranging from median list prices and inventory counts to share of listings with a price cut, median price cut size, age of inventory, and the days a listing spent on Zillow before the sale was final. See below for full list with definitions.

Acknowledgements

Data courtesy of Zillow
Historical House Prices across the U.S.
kaggle.com
Updated Dec 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis Heitor Ribeiro (2024). Historical House Prices across the U.S. [Dataset]. https://www.kaggle.com/datasets/luisheitorribeiro/historical-house-prices-across-u-s-regions/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 6, 2024
Dataset provided by
Kaggle
Authors
Luis Heitor Ribeiro
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
United States
Description
This dataset from Zillow contains a smoothed, seasonally adjusted measure of the typical home value and market changes across across various regions in the United States from January 2000 to August 2022. The data includes monthly observations for hundreds of cities and states, offering insights into regional housing market trends over two decades.
House Price Predication
kaggle.com
Updated May 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sheema Zain (2024). House Price Predication [Dataset]. https://www.kaggle.com/datasets/sheemazain/house-price-predication
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 7, 2024
Dataset provided by
Kaggle
Authors
Sheema Zain
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
House price prediction Predicting house prices is a common task in data science and machine learning. Here's a high-level overview of how you might approach it:

Data Collection: Gather a dataset containing features of houses (e.g., size, number of bedrooms, location, amenities) and their corresponding prices. Websites like Zillow, Kaggle, or government housing datasets are good sources.

Data Preprocessing: Clean the data by handling missing values, encoding categorical variables, and scaling numerical features if necessary. This step ensures that the data is in a suitable format for training a model. Feature Selection/Engineering: Choose relevant features that are likely to influence house prices. You may also create new features based on domain knowledge or data analysis.

Model Selection: Select a regression model suitable for predicting continuous target variables like house prices. Common choices include Linear Regression, Decision Trees, Random Forests, Gradient Boosting, and Neural Networks.

Model Training: Split your dataset into training and testing sets to train and evaluate the performance of your model. You can further split the training set for validation purposes or use cross-validation techniques.

Model Evaluation: Assess the performance of your model using appropriate evaluation metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), or Root Mean Squared Error (RMSE).

Hyperparameter Tuning: Fine-tune your model's hyperparameters to improve its performance. Techniques like grid search or random search can be employed for this purpose.

Deployment: Once satisfied with your model's performance, deploy it to make predictions on new data. This could be as simple as saving the trained model and creating an interface for users to input house features.
Negative Equity in U.S. Housing Market
kaggle.com
Updated Jan 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Description
Negative Equity in U.S. Housing Market

Measuring Home Values, Debt, and Credit Risk

By Zillow Data [source]

About this dataset

This dataset, Negative Equity in the US Housing Market, provides an in-depth look into the negative equity occurring across the United States during this single quarter. Included are metrics such as total amount of negative equity in millions of dollars, total number of homes in negative equity, percentage of homes with mortgages that are in negative equity and more. These data points provide helpful insights into both regional and national trends regarding the prevalence and rate of home mortgage delinquency stemming from a diminishment of value from peak levels.

Home types available for analysis include 'all homes', condos/co-ops, multifamily units containing five or more housing units as well as duplexes/triplexes. Additionally, Cash buyers rates for particular areas can also be determined by referencing this collection. Further metrics such as mortgage affordability rates and impacts on overall indebtedness are readily calculated using information related to Zillow's Home Value Index (ZHVI) forecast methodology and TransUnion data respectively.

Other variables featured within this dataset include characteristics like region type (i.e city, county ..etc), size rank based on population values , percentage change in ZHVI since peak levels as well as loan-to-value ratio greater than 200 across all regions constituted herein (NE). Moreover Zillow's own Secondary Mortgage Market Survey data is utilized to acquire average mortgage quote rates while correlative Census Bureau NCHS median household income figures represent typical assessable proportions between wages and debt obligations . So whether you're looking to assess effects along metro lines or detailed buffering through zip codes , this database should prove sufficient for insightful explorations! Nonetheless users must strictly adhere to all conditions encompassed within Terms Of Use commitments put forth by our lead provider before accessing any resources included herewith

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

Research Ideas

Analyzing regional and state trends in negative equity: Analyze geographic differences in the percentage of mortgages “underwater”, total amount of negative equity, number of homes at least 90 days late, and other key indicators to provide insight into the factors influencing negative equity across regions, states and cities.

Tracking the recovery rate over time: Track short-term changes in numbers related to negative equity (e.g., region or area ZHVI Change from Peak) to monitor recovery rates over time as well as how different policy interventions are affecting homeownership levels in affected areas.

Exploring best practices for promoting housing affordability: Compare affordability metrics (e.g., mortgage payments, price-to-income ratios) across different geographic locations over time to identify best practices for empowering homeowners and promoting stability within the housing market while reducing local inequality impacts related to availability of affordable housing options and access to credit markets like mortgages/loans etc

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: NESummary_2017Q1_Public.csv | Column name | Description | |:------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------| | RegionType | The type of region (e.g., city, county, metro etc.) (String) | | City | Name of the city (String) | | County | Name of the county (String) | | State | Name of the state (String) | | Metro ...
Data from: Neighborhoods in New York
kaggle.com
zip
Updated Jul 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jack Cook (2017). Neighborhoods in New York [Dataset]. https://www.kaggle.com/jackcook/neighborhoods-in-new-york
Explore at:
zip(1069387 bytes)Available download formats
Dataset updated
Jul 23, 2017
Authors
Jack Cook
Area covered
New York
Description
Context

This dataset contains shapefiles outlining 558 neighborhoods in 50 major cities in New York state, notably including Albany, Buffalo, Ithaca, New York City, Rochester, and Syracuse. This adds context to your datasets by identifying the neighborhood of any locations you have, as coordinates on their own don't carry a lot of information.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too. What fields does it include? What's the time period of the data and how was it collected?

Four files are included containing data about the shapes: an SHX file, a DBF file, an SHP file, and a PRJ file. Including all of them in your input data are necessary, as they all contain pieces of the data; one file alone will not have everything that you need.

Seeing how none of these files are plaintext, it can be a little difficult to get set up with them. I highly recommend using mapshaper.org to get started- this site will show you the boundaries drawn on a plane, as well as allow you to export the files in a number of different formats (e.g. GeoJSON, CSV) if you are unable to use them in the format they are provided in. Personally, I have found it easier to work with the shapefile format though.

To get started with the shapefile in R, you can use the the rgdal and rgeos packages. To see an example of these being used, be sure to check out my kernel, "Incorporating neighborhoods into your model".

Acknowledgements

These files were provided by Zillow and are available under a Creative Commons license.

Inspiration

I'll be using these in the NYC Taxi Trip Duration competition to add context to the pickup and dropoff locations of the taxi rides and hopefully greatly improve my predictions.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Phillip Zhang (2019). Zillow Dataset [Dataset]. https://www.kaggle.com/datasets/phillipzhang/zillow-dataset

Zillow Dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 18, 2019

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Phillip Zhang

Description

Dataset

This dataset was created by Phillip Zhang

Zillow data in different states and times

Clear search

Close search

Google apps

Main menu

Zillow Dataset

Dataset

Contents

Zillow House Price Data

Context

Content

Acknowledgements

My Zillow Predictions

Context

Content

Acknowledgements

Inspiration

Median Home Sale Price

Context

Acknowledgements

Historical House Prices across the U.S.

House Price Predication

Negative Equity in U.S. Housing Market

Negative Equity in U.S. Housing Market

Measuring Home Values, Debt, and Credit Risk

About this dataset

More Datasets

Featured Notebooks

Research Ideas

Acknowledgements

License

Columns

Data from: Neighborhoods in New York

Context

Content

Acknowledgements

Inspiration

Zillow Dataset

Dataset

Contents