23 datasets found
  1. US Cost of Living Dataset (1877 Counties)

    • kaggle.com
    zip
    Updated Feb 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    asaniczka (2024). US Cost of Living Dataset (1877 Counties) [Dataset]. https://www.kaggle.com/datasets/asaniczka/us-cost-of-living-dataset-3171-counties
    Explore at:
    zip(1282159 bytes)Available download formats
    Dataset updated
    Feb 17, 2024
    Authors
    asaniczka
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    The US Family Budget Dataset provides insights into the cost of living in different US counties based on the Family Budget Calculator by the Economic Policy Institute (EPI).

    This dataset offers community-specific estimates for ten family types, including one or two adults with zero to four children, in all 1877 counties and metro areas across the United States.

    Interesting Task Ideas:

    1. See how family budgets compare to the federal poverty line and the Supplemental Poverty Measure in different counties.
    2. Look into the money challenges faced by different types of families using the budgets provided.
    3. Find out which counties have the most affordable places to live, food, transportation, healthcare, childcare, and other things people need.
    4. Explore how the average income of families relates to the overall cost of living in different counties.
    5. Investigate how family size affects the estimated budget and find counties where bigger families have higher costs.
    6. Create visuals showing how the cost of living varies across different states and big cities.
    7. Check whether specific counties are affordable for families of different sizes and types.
    8. Use the dataset to compare living standards and economic security in different US counties.

    If you find this dataset valuable, don't forget to hit the upvote button! 😊💝

    Checkout my other datasets

    Employment-to-Population Ratio for USA

    Productivity and Hourly Compensation

    130K Kindle Books

    900K TMDb Movies

    USA Unemployment Rates by Demographics & Race

    Photo by Alev Takil on Unsplash

  2. a

    Average Household Income in the United States-Copy

    • umn.hub.arcgis.com
    Updated Dec 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Minnesota (2022). Average Household Income in the United States-Copy [Dataset]. https://umn.hub.arcgis.com/maps/87822c1c7dda498fbc04bb27ecc10942
    Explore at:
    Dataset updated
    Dec 10, 2022
    Dataset authored and provided by
    University of Minnesota
    Area covered
    Description

    This map shows the average household income in the U.S. in 2022 in a multiscale map by country, state, county, ZIP Code, tract, and block group. Information for the average household income is an estimate of income for calendar year 2022. Income amounts are expressed in current dollars, including an adjustment for inflation or cost-of-living increases.The pop-up is configured to include the following information for each geography level:Average household incomeMedian household incomeCount of households by income groupAverage household income by householder age groupThe data shown is from Esri's 2022 Updated Demographic estimates using Census 2020 geographies. The map adds increasing level of detail as you zoom in, from state, to county, to ZIP Code, to tract, to block group data.Esri's U.S. Updated Demographic (2022/2027) Data: Population, age, income, sex, race, home value, and marital status are among the variables included in the database. Each year, Esri's Data Development team employs its proven methodologies to update more than 2,000 demographic variables for a variety of U.S. geographies.Additional Esri Resources:Esri DemographicsU.S. 2022/2027 Esri Updated DemographicsEssential demographic vocabularyThis item is for visualization purposes only and cannot be exported or used in analysis.Permitted use of this data is covered in the DATA section of the Esri Master Agreement (E204CW) and these supplemental terms.

  3. American House Prices

    • kaggle.com
    zip
    Updated Dec 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeremy Larcher (2023). American House Prices [Dataset]. https://www.kaggle.com/datasets/jeremylarcher/american-house-prices-and-demographics-of-top-cities
    Explore at:
    zip(682260 bytes)Available download formats
    Dataset updated
    Dec 9, 2023
    Authors
    Jeremy Larcher
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    United States
    Description

    A dataset comprising various variables around housing and demographics for the top 50 American cities by population.

    Variables:

    Zip Code: Zip code within which the listing is present.

    Price: Listed price for the property.

    Beds: Number of beds mentioned in the listing.

    Baths: Number of baths mentioned in the listing.

    Living Space: The total size of the living space, in square feet, mentioned in the listing.

    Address: Street address of the listing.

    City: City name where the listing is located.

    State: State name where the listing is located.

    Zip Code Population: The estimated number of individuals within the zip code. Data from Simplemaps.com.

    Zip Code Density: The estimated number of individuals per square mile within the zip code. Data from Simplemaps.com.

    County: County where the listing is located.

    Median Household income: Estimated median household income. Data from the U.S. Census Bureau.

    Latitude: Latitude of the zip code. ** Data from Simplemaps.com.**

    Longitude: Longitude of the zip code. Data from Simplemaps.com.

  4. Cost of Living in Nairobi

    • kaggle.com
    zip
    Updated Feb 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yacooti (2025). Cost of Living in Nairobi [Dataset]. https://www.kaggle.com/datasets/yacooti/cost-of-living-in-nairobi
    Explore at:
    zip(1110787 bytes)Available download formats
    Dataset updated
    Feb 15, 2025
    Authors
    Yacooti
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Nairobi
    Description

    🏡 Cost of Living in Nairobi, Kenya

    📌 Overview

    This dataset provides a detailed time-series estimate of the monthly cost of living across 20 different areas in Nairobi, Kenya from 2019 to 2024. It covers essential expenses such as rent, food, transport, utilities, and miscellaneous costs, allowing for comprehensive cost-of-living analysis.

    This dataset is useful for:
    ✅ Individuals planning to move to Nairobi
    ✅ Researchers analyzing long-term cost trends
    ✅ Businesses assessing salary benchmarks based on inflation
    ✅ Data scientists developing predictive models for cost forecasting

    📊 Data Summary

    • Total Records: 60,000 (5 years of monthly data)
    • Columns:
      • 🏠 Area: The residential area in Nairobi
      • 💰 Rent: Estimated monthly rent (KES)
      • 🍽️ Food: Grocery and dining expenses (KES)
      • 🚕 Transport: Public and private transport costs (KES)
      • Utilities: Water, electricity, and internet bills (KES)
      • 🎭 Misc: Entertainment, personal care, and leisure expenses (KES)
      • 🏷️ Total: Sum of all expenses
      • 📆 Date: Monthly timestamp from January 2019 to December 2024

    📍 Areas Covered

    This dataset provides cost estimates for 20+ residential areas, including:
    - High-End Areas 🏡: Kileleshwa, Westlands, Karen
    - Mid-Range Areas 🏙️: South B, Langata, Ruaka
    - Affordable Areas 🏠: Embakasi, Kasarani, Githurai, Ruiru, Umoja
    - Satellite Towns 🌿: Ngong, Rongai, Thika, Kitengela, Kikuyu

    🛠️ How the Data Was Generated

    This dataset was synthetically generated using Python, incorporating realistic market variations. The process includes:

    Inflation Modeling 📈 – A 2% annual increase in costs over time.
    Seasonal Effects 📅 – Higher food and transport costs in December & January (holiday season), rent spikes in June & July.
    Economic Shocks ⚠️ – A 5% chance per record of external economic effects (e.g., fuel price hikes, supply chain issues).
    Random Fluctuations 🔄 – Expenses vary slightly month-to-month to simulate real-world spending behavior.

    🔍 Potential Use Cases

    • 📊 Cost of Living Analysis – Compare affordability across different Nairobi areas.
    • 💵 Salary & Real Estate Benchmarking – Businesses can analyze salary expectations by location.
    • 📉 Time-Series Forecasting – Train predictive models (ARIMA, Prophet, LSTM) to estimate future living costs.
    • 📈 Inflation Impact Studies – Measure how economic conditions influence cost variations over time.

    ⚠️ Limitations

    • Synthetic Data – The dataset is not based on real survey data but follows market trends.
    • No Lifestyle Adjustments – Differences in household size or spending habits are not factored in.
    • Inflation Approximation – While inflation is simulated at 2% annually, actual inflation rates may differ.

    📁 File Format & Access

    • nairobi_cost_of_living_time_series.csv – 60,000 records in CSV format (time-series structured).

    📢 Acknowledgments

    This dataset was generated for research and educational purposes. If you find it useful, consider citing it in your work. 🚀

    📥 Download and Explore the Data Now!

    This updated version makes your documentation more detailed and actionable for users interested in forecasting and economic analysis. Would you like help building a cost prediction model? 🚀

  5. e

    Household income distributions by geography (U.S. Census ACS)

    • efficientdollar.com
    Updated Nov 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Census Bureau (2025). Household income distributions by geography (U.S. Census ACS) [Dataset]. https://efficientdollar.com/tools/income-percentile
    Explore at:
    Dataset updated
    Nov 30, 2025
    Dataset authored and provided by
    U.S. Census Bureau
    License

    https://www.census.gov/data/developers/about/terms-of-service.htmlhttps://www.census.gov/data/developers/about/terms-of-service.html

    Area covered
    United States
    Description

    Household income bucket counts by ZIP code, city, county, state, and the U.S. overall, based on American Community Survey (ACS) 5-year estimates.

  6. Zillow Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2022). Zillow Datasets [Dataset]. https://brightdata.com/products/datasets/zillow
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 19, 2022
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Gain a complete view of the real estate market with our Zillow datasets. Track price trends, rental/sale status, and price per square foot with the Zillow Price History dataset and explore detailed listings with prices, locations, and features using the Zillow Properties Listing dataset. Over 134M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:

    Zpid
    City
    State
    Home Status
    Street Address
    Zipcode
    Home Type
    Living Area Value
    Bedrooms
    Bathrooms
    Price
    Property Type
    Date Sold
    Annual Homeowners Insurance
    Price Per Square Foot
    Rent Zestimate
    Tax Assessed Value
    Zestimate
    Home Values
    Lot Area
    Lot Area Unit
    Living Area
    Living Area Units
    Property Tax Rate
    Page View Count
    Favorite Count
    Time On Zillow
    Time Zone
    Abbreviated Address
    Brokerage Name
    And much more
    
  7. Average Household Income in the United States

    • dbechard-open-data-gisanddata.hub.arcgis.com
    Updated Jun 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2018). Average Household Income in the United States [Dataset]. https://dbechard-open-data-gisanddata.hub.arcgis.com/maps/6d7b0a1dcad847be820c3d1424f79dd8
    Explore at:
    Dataset updated
    Jun 26, 2018
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    Description

    Retirement Notice: This item is in mature support as of June 2023 and will be retired in December 2025. A replacement item has not been identified at this time. Esri recommends updating your maps and apps to phase out use of this item.This map shows the average household income in the U.S. in 2022 in a multiscale map by country, state, county, ZIP Code, tract, and block group. Information for the average household income is an estimate of income for calendar year 2022. Income amounts are expressed in current dollars, including an adjustment for inflation or cost-of-living increases.The pop-up is configured to include the following information for each geography level:Average household incomeMedian household incomeCount of households by income groupAverage household income by householder age group Permitted use of this data is covered in the DATA section of the Esri Master Agreement (E204CW) and these supplemental terms.

  8. USA Real Estate Dataset

    • kaggle.com
    zip
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Shahriar Sakib (2024). USA Real Estate Dataset [Dataset]. https://www.kaggle.com/datasets/ahmedshahriarsakib/usa-real-estate-dataset/
    Explore at:
    zip(40085115 bytes)Available download formats
    Dataset updated
    Mar 30, 2024
    Authors
    Ahmed Shahriar Sakib
    Area covered
    United States
    Description

    Context

    This dataset contains Real Estate listings in the US broken by State and zip code.

    Download

    kaggle API Command !kaggle datasets download -d ahmedshahriarsakib/usa-real-estate-dataset

    Content

    The dataset has 1 CSV file with 10 columns -

    1. realtor-data.csv (2,226,382 entries)
      • brokered by (categorically encoded agency/broker)
      • status (Housing status - a. ready for sale or b. ready to build)
      • price (Housing price, it is either the current listing price or recently sold price if the house is sold recently)
      • bed (# of beds)
      • bath (# of bathrooms)
      • acre_lot (Property / Land size in acres)
      • street (categorically encoded street address)
      • city (city name)
      • state (state name)
      • zip_code (postal code of the area)
      • house_size (house area/size/living space in square feet)
      • prev_sold_date (Previously sold date)

    NB: 1. brokered by and street addresses were categorically encoded due to data privacy policy 2. acre_lot means the total land area, and house_size denotes the living space/building area

    Acknowledgements

    Data was collected from - - https://www.realtor.com/ - A real estate listing website operated by the News Corp subsidiary Move, Inc. and based in Santa Clara, California. It is the second most visited real estate listing website in the United States as of 2024, with over 100 million monthly active users.

    Cover Image

    Image by Mohamed Hassan from Pixabay

    Disclaimer

    The data and information in the data set provided here are intended to use for educational purposes only. I do not own any data, and all rights are reserved to the respective owners.

    Inspiration

    • Can we predict housing prices based on the features?
    • How are housing price and location attributes correlated?
    • What is the overall picture of the USA housing prices w.r.t. locations?
    • Do house attributes (bedroom, bathroom count) strongly correlate with the price? Are there any hidden patterns?
  9. a

    LA Pollution BalderramaM

    • univredlands.hub.arcgis.com
    Updated Apr 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    URSpatial (2018). LA Pollution BalderramaM [Dataset]. https://univredlands.hub.arcgis.com/maps/9fcedc7ed0f847fe886b684801c81c56
    Explore at:
    Dataset updated
    Apr 10, 2018
    Dataset authored and provided by
    URSpatial
    Area covered
    Description

    This map shows the median household income in the United States in 2012. Information for the 2012 Median Household Income is an estimate of income for calendar year 2012. Income amounts are expressed in current dollars, including an adjustment for inflation or cost-of-living increases. The median is the value that divides the distribution of household income into two equal parts. The median household income in the United States overall was $50,157 in 2012. This map shows Esri's 2012 estimates using Census 2010 geographies. The data shown is from Esri's 2012 Updated Demographics. The map adds increasing level of detail as you zoom in, from state, to county, to ZIP Code, to tract, to block group data. This map shows Esri's 2012 estimates using Census 2010 geographies.The map is designed to be displayed in conjunction with the Canvas basemap with a transparency of 25%. To use it on other basemaps, try a transparency of 25-50%.Information about the USA Median Household Income map service used in this map is here.

  10. House Pricing Dataset

    • kaggle.com
    zip
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aly El-badry (2025). House Pricing Dataset [Dataset]. https://www.kaggle.com/datasets/alyelbadry/house-pricing-dataset
    Explore at:
    zip(815554 bytes)Available download formats
    Dataset updated
    Jan 27, 2025
    Authors
    Aly El-badry
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    House Prices Dataset

    Subtitle:

    Detailed Real Estate Data for Predicting House Prices and Analyzing Market Trends

    Description:

    This dataset contains information on 21,613 properties, making it a comprehensive resource for exploring real estate market trends and building predictive models for house prices. The data includes various features capturing property details, location, and market conditions, providing ample opportunities for data exploration, visualization, and machine learning applications.

    Key Features:

    • General Information:

      • id: Unique identifier for each property.
      • date: Date of sale.
    • Price Details:

      • price: Sale price of the house.
    • Property Features:

      • bedrooms: Number of bedrooms.
      • bathrooms: Number of bathrooms (including partials as fractions).
      • sqft_living: Living space area in square feet.
      • sqft_lot: Lot size in square feet.
      • floors: Number of floors.
      • waterfront: Whether the property has a waterfront view.
      • view: Quality of the view rating.
      • condition: Overall condition of the house.
      • grade: Grade of construction and design (scale of 1–13).
    • Additional Metrics:

      • sqft_above: Square footage of the property above ground.
      • sqft_basement: Basement area in square feet.
      • yr_built: Year the property was built.
      • yr_renovated: Year of last renovation.
    • Location Coordinates:

      • zipcode: ZIP code of the property.
      • lat and long: Latitude and longitude coordinates.
    • Neighbor Comparisons:

      • sqft_living15: Average living space of 15 nearest properties.
      • sqft_lot15: Average lot size of 15 nearest properties.

    Use Cases:

    • Predicting house prices using regression models.
    • Identifying the impact of various features (e.g., number of bedrooms, location) on property prices.
    • Analyzing market trends and spatial distribution of real estate prices.

    This dataset is a valuable resource for anyone interested in real estate analytics, machine learning, or geographic data visualization.

  11. House_Prices_Dataset

    • kaggle.com
    zip
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beytullah Soylev (2024). House_Prices_Dataset [Dataset]. https://www.kaggle.com/datasets/soylevbeytullah/house-prices-dataset
    Explore at:
    zip(798275 bytes)Available download formats
    Dataset updated
    May 23, 2024
    Authors
    Beytullah Soylev
    Description

    Dataset includes house sale prices for King County in USA. Homes that are sold in the time period: May, 2014 and May, 2015.

    Columns: - ida: notation for a house - date: Date house was sold - price: Price is prediction target - bedrooms: Number of Bedrooms/House - bathrooms: Number of bathrooms/House - sqft_living: square footage of the home - sqft_lot: square footage of the lot - floors: Total floors (levels) in house - waterfront: House which has a view to a waterfront - view: Has been viewed - condition: How good the condition is ( Overall ) - grade: overall grade given to the housing unit, based on King County grading system - sqft_abovesquare: footage of house apart from basement - sqft_basement: square footage of the basement - yr_built: Built Year - yr_renovated: Year when house was renovated - zipcode: zip - lat: Latitude coordinate - long: Longitude coordinate - sqft_living15: Living room area in 2015(implies-- some renovations) - sqft_lot15: lotSize area in 2015(implies-- some renovations)

  12. USA House Prices

    • kaggle.com
    zip
    Updated Jul 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fırat Özcan (2024). USA House Prices [Dataset]. https://www.kaggle.com/datasets/fratzcan/usa-house-prices/code
    Explore at:
    zip(121422 bytes)Available download formats
    Dataset updated
    Jul 21, 2024
    Authors
    Fırat Özcan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    United States
    Description

    Real estate markets are of great importance for both local and international investors. Sydney and Melbourne are two dynamic markets where economic and social factors have significant impacts on property prices. Below is a detailed description of each feature:

    1. Date: The date when the property was sold. This feature helps in understanding the temporal trends in property prices.
    2. Price:The sale price of the property in USD. This is the target variable we aim to predict.
    3. Bedrooms:The number of bedrooms in the property. Generally, properties with more bedrooms tend to have higher prices.
    4. Bathrooms: The number of bathrooms in the property. Similar to bedrooms, more bathrooms can increase a property’s value.
    5. Sqft Living: The size of the living area in square feet. Larger living areas are typically associated with higher property values.
    6. Sqft Lot:The size of the lot in square feet. Larger lots may increase a property’s desirability and value.
    7. Floors: The number of floors in the property. Properties with multiple floors may offer more living space and appeal.
    8. Waterfront: A binary indicator (1 if the property has a waterfront view, 0 other- wise). Properties with waterfront views are often valued higher.
    9. View: An index from 0 to 4 indicating the quality of the property’s view. Better views are likely to enhance a property’s value.
    10. Condition: An index from 1 to 5 rating the condition of the property. Properties in better condition are typically worth more.
    11. Sqft Above: The square footage of the property above the basement. This can help isolate the value contribution of above-ground space.
    12. Sqft Basement: The square footage of the basement. Basements may add value depending on their usability.
    13. Yr Built: The year the property was built. Older properties may have historical value, while newer ones may offer modern amenities.
    14. Yr Renovated: The year the property was last renovated. Recent renovations can increase a property’s appeal and value.
    15. Street: The street address of the property. This feature can be used to analyze location-specific price trends.
    16. City: The city where the property is located. Different cities have distinct market dynamics.
    17. Statezip: The state and zip code of the property. This feature provides regional context for the property.
    18. Country: The country where the property is located. While this dataset focuses on properties in Australia, this feature is included for completeness.

    If you like this dataset, please contribute by upvoting

  13. Realtor Real Estate USA

    • kaggle.com
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neeraj (2023). Realtor Real Estate USA [Dataset]. https://www.kaggle.com/datasets/neerajkld/realtor-real-estate-usa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    Kaggle
    Authors
    Neeraj
    Area covered
    United States
    Description

    Context This dataset shows real estate listing in USA. It includes the price, zip codes etc

    Sources This shows real estate data of company called Realtor - https://www.realtor.com. I downloaded the dataset from kaggle.

    About Dataset 1 csv. file contains 10 columns - realtor-data.csv (100k+ entries) - status (Housing status - a. ready for sale or b. ready to build) - bed (# of beds) - bath (# of bathrooms) - acre_lot (Property / Land size in acres) - city (city name) - state (state name) - zip_code (postal code of the area) - house_size (house area/size/living space in square feet) - prev_sold_date (Previously sold date) - price (Housing price, it is either the current listing price or recently sold price if the house is sold recently)

    Cover Image Downloaded from Google Stock images.

    Disclaimer The data and information in the data set provided here are intended to use for educational purposes only. I do not own any data, and all rights are reserved to the respective owners.

  14. Housing Prices Dataset

    • kaggle.com
    zip
    Updated Jan 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2022). Housing Prices Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/housing-prices-dataset
    Explore at:
    zip(4740 bytes)Available download formats
    Dataset updated
    Jan 12, 2022
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">

    Description:

    A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?

    Acknowledgement:

    Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build Regression models to predict the sales w.r.t a single & multiple feature.
    • Also evaluate the models & compare thier respective scores like R2, RMSE, etc.
  15. Redfin data of SoCal

    • figshare.com
    csv
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Quintana (2025). Redfin data of SoCal [Dataset]. http://doi.org/10.6084/m9.figshare.30506468.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 1, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Michael Quintana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Southern California
    Description

    This dataset supports the research article “Predicting Residential Property Values Using XGBoost and Spatial–Temporal Encoding: Evidence from Southern California” by Michael Quintana (2025).The dataset contains a cleaned and anonymized subset of residential property transactions derived from Redfin’s publicly available data export (June 2025).Each observation represents a single-family home, condominium, or townhouse sold in Southern California.Variables include sale price, living area, lot size, year built, bedrooms, bathrooms, ZIP code, and days on market.The dataset was used to train and validate an XGBoost regression model designed to estimate home prices using both structural and spatial–temporal features.All personally identifiable or proprietary location data have been removed or aggregated at the ZIP-code level to maintain privacy while preserving statistical utility.This dataset and accompanying R scripts allow replication of the core results presented in the study, including model training, feature importance analysis, and predictive performance evaluation.

  16. House Price Prediction Treated Dataset

    • kaggle.com
    zip
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinicius Araujo (2024). House Price Prediction Treated Dataset [Dataset]. https://www.kaggle.com/datasets/aravinii/house-price-prediction-treated-dataset
    Explore at:
    zip(286105 bytes)Available download formats
    Dataset updated
    Oct 22, 2024
    Authors
    Vinicius Araujo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PLEASE UPVOTE IF YOU LIKE THIS CONTENT! 😍

    Same dataset as "House Sales in King County, USA", but with treated content and with a split version (train-test) allowing direct use in machine learning models.

    We have 14 columns in the dataset, as it follows:

    • date: Date of the home sale
    • price: Price of each home sold
    • bedrooms: Number of bedrooms
    • bathrooms: Number of bathrooms
    • living_in_m2: Square meters of the apartments interior living space
    • nice_view: A flag that indicates the view's quality of a property
    • perfect_condition: A flag that indicates the maximum index of the apartment condition
    • grade: An index from 1 to 5, where 1 falls short of quality level and 5 have a high quality level of construction and design
    • has_basement: A flag indicating whether or not a property has a basement
    • renovated: A flag if the property was renovated
    • has_lavatory: Check for the presence of these incomplete/secondary bathrooms (bathtub, sink, toilet)
    • single_floor: A flag indicating whether the property had only one floor
    • month: The month of the home sale
    • quartile_zone: A quartile distribution index of the most expensive zip codes, where 1 means less expansive and 4 most expansive.
  17. 🇨🇭House Prices Switzerland 2024

    • kaggle.com
    zip
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Etienne Kaiser (郑翊天) (2024). 🇨🇭House Prices Switzerland 2024 [Dataset]. https://www.kaggle.com/datasets/etiennekaiser/switzerland-house-price-prediction-data/data
    Explore at:
    zip(22841 bytes)Available download formats
    Dataset updated
    Mar 4, 2024
    Authors
    Etienne Kaiser (郑翊天)
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Switzerland
    Description

    Overview

    The allure of Swiss houses lies in their harmonious integration with the breathtaking natural surroundings, offering residents a unique blend of tranquility and cosmopolitan living.

    Switzerland's renowned stability, both economically and politically, adds a layer of desirability to its real estate market. The country's commitment to quality living is reflected not only in its efficient infrastructure but also in its high standards of education, healthcare, and overall well-being.

    However, such a high quality of life and the unparalleled Swiss experience come at a cost. Swiss house prices reflect the exclusivity and desirability of the real estate market, positioning it as an investment in both luxury and lifestyle. In this journey through Swiss real estate, I unravel the layers of this captivating narrative, helping you exploring not just the numbers but the essence of what makes owning a house in Switzerland an aspirational dream for many - probably also yours!?

    File

    • house_price_switzerland.csv - The complete dataset with 11 columns.

    Columns

    • ID - A unique identifier for each object in the dataset.
    • HouseType - Describes the type of the house, such as "Villa".
    • Size - Represents a categorical size classification, for example, "L".
    • Price - Indicates the value of the house in Swiss Francs (CHF). If NaN the price is "Price on Request".
    • LotSize - Specifies the surrounding area of the property in square meters.
    • Balcony - Binary indicator of whether the object has a balcony (Yes/No).
    • LivingSpace - Denotes the living area of the house in square meters.
    • NumberRooms - Indicates the total number of rooms in the house.
    • YearBuilt - Represents the year in which the house was constructed.
    • Locality - Specifies the city or town where the object is situated.
    • PostalCode - Corresponds to the postal code of the locality where the object is located.
  18. Ames Housing Engineered Dataset

    • kaggle.com
    Updated Sep 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atefeh Amjadian (2025). Ames Housing Engineered Dataset [Dataset]. https://www.kaggle.com/datasets/atefehamjadian/ameshousing-engineered
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Atefeh Amjadian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Ames
    Description

    This dataset is an engineered version of the original Ames Housing dataset from the "House Prices: Advanced Regression Techniques" Kaggle competition. The goal of this engineering was to clean the data, handle missing values, encode categorical features, scale numeric features, manage outliers, reduce skewness, select useful features, and create new features to improve model performance for house price prediction.

    The original dataset contains information on 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, with the target variable being SalePrice. This engineered version has undergone several preprocessing steps to make it ready for machine learning models.

    Preprocessing Steps Applied

    1. Missing Value Handling: Missing values in categorical columns with meaningful absence (e.g., no pool for PoolQC) were filled with "None". Numeric columns were filled with median, and other categorical columns with mode.
    2. Correlation-based Feature Selection: Numeric features with absolute correlation < 0.1 with SalePrice were removed.
    3. Encoding Categorical Variables: Ordinal features (e.g., quality ratings) were encoded using OrdinalEncoder, and nominal features (e.g., neighborhoods) using OneHotEncoder.
    4. Outlier Handling: Outliers in numeric features were detected using IQR and capped (Winsorized) to IQR bounds to preserve data while reducing extreme values.
    5. Skewness Handling: Highly skewed numeric features (|skew| > 1) were transformed using Yeo-Johnson to make distributions more normal-like.
    6. Additional Feature Selection: Low-variance one-hot features (variance < 0.01) and highly collinear features (|corr| > 0.8) were removed.
    7. Feature Scaling: Numeric features were scaled using RobustScaler to handle outliers.
    8. Duplicate Removal: Duplicate rows were checked and removed if found (none in this dataset).

    The final dataset has fewer columns than the original (reduced from 81 to approximately 250 after one-hot encoding, then further reduced by feature selection), with improved quality for modeling.

    New Features Created

    To add more predictive power, the following new features were created based on domain knowledge: 1. HouseAge: Age of the house at the time of sale. Calculated as YrSold - YearBuilt. This captures how old the house is, which can negatively affect price due to depreciation. - Example: A house built in 2000 and sold in 2008 has HouseAge = 8. 2. Quality_x_Size: Interaction term between overall quality and living area. Calculated as OverallQual * GrLivArea. This combines quality and size to capture the value of high-quality large homes. - Example: A house with OverallQual = 7 and GrLivArea = 1500 has Quality_x_Size = 10500. 3. TotalSF: Total square footage of the house. Calculated as GrLivArea + TotalBsmtSF + 1stFlrSF + 2ndFlrSF (if available). This aggregates area features into a single metric for better price prediction. - Example: If GrLivArea = 1500 and TotalBsmtSF = 1000, TotalSF = 2500. 4. Log_LotArea: Log-transformed lot area to reduce skewness. Calculated as np.log1p(LotArea). This makes the distribution of lot sizes more normal, helping models handle extreme values. - Example: A lot area of 10000 becomes Log_LotArea ≈ 9.21.

    These new features were created using the original (unscaled) values to maintain interpretability, then scaled with RobustScaler to match the rest of the dataset.

    Data Dictionary

    • Original Numeric Features: Kept features with |corr| > 0.1 with SalePrice, such as:
      • OverallQual: Material and finish quality (scaled, 1-10).
      • GrLivArea: Above grade (ground) living area square feet (scaled).
      • GarageCars: Size of garage in car capacity (scaled).
      • TotalBsmtSF: Total square feet of basement area (scaled).
      • And others like FullBath, YearBuilt, etc. (see the code for the full list).
    • Ordinal Encoded Features: Quality and condition ratings, e.g.:
      • ExterQual: Exterior material quality (encoded as 0=Po to 4=Ex).
      • BsmtQual: Basement quality (encoded as 0=None to 5=Ex).
    • One-Hot Encoded Features: Nominal categorical features, e.g.:
      • MSZoning_RL: 1 if residential low density, 0 otherwise.
      • Neighborhood_NAmes: 1 if in NAmes neighborhood, 0 otherwise.
    • New Engineered Features (as described above):
      • HouseAge: Age of the house (scaled).
      • Quality_x_Size: Overall quality times living area (scaled).
      • TotalSF: Total square footage (scaled).
      • Log_LotArea: Log-transformed lot area (scaled).
    • Target: SalePrice - The property's sale price in dollars (not scaled, as it's the target).

    Total columns: Approximately 200-250 (after one-hot encoding and feature selection).

    License

    This dataset is derived from the Ames Housing...

  19. house_data

    • kaggle.com
    Updated Jul 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arathi P Raj (2022). house_data [Dataset]. https://www.kaggle.com/datasets/arathipraj/house-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arathi P Raj
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Content

    The dataset consists of Price of Houses in King County , Washington from sales between May 2014 and May 2015. Along with house price it consists of information on 18 house features, date of sale and ID of sale.

    Attribute information

    1. id - Unique id for each home sold
    2. date - Date of the home saled
    3. price - Price of each home sold
    4. bedrooms - Number of bedrooms
    5. bathrooms - Number of bathrooms
    6. sqft _ living - Square footage of the apartments interior living space
    7. sqft _ lot - Square footage of the land space
    8. floors - Number of floors
    9. waterfront - A dummy variable for whether the apartment was overlooking the waterfront or not
    10. view - An index from 0 to 4 of how good the view of the property was
    11. condition - an index from 1 to 5 on the condition of the apartment
    12. grade - An index from 1 to 13 , where 1-3falls short of building construction and design, 7 has an average level of construction and design , and 11-13 have a high quality level of construction and design
    13. sqft _ above - the square footage of the interior housing space that is above ground level
    14. sqft _ basement - the square footage of the inerior housing space that is below ground level
    15. yr _ built - The year of the house was initially built
    16. yr _ renovated - The year of the house's last renovation
    17. zipcode - What zipcode area the house is in
    18. lat - Lattitude
    19. long - Longitude
    20. sqft _ living15 - The square footage of inerior housing living space for the nearest nearest 15 neighbours
    21. sqft _ lot15 - the square footage of the land lots of the nearest 15 neighbours
  20. KC_House_Data

    • kaggle.com
    zip
    Updated Aug 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    astronaut_elvis (2021). KC_House_Data [Dataset]. https://www.kaggle.com/datasets/astronautelvis/kc-house-data/code
    Explore at:
    zip(831605 bytes)Available download formats
    Dataset updated
    Aug 14, 2021
    Authors
    astronaut_elvis
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The dataset for this project originates from the UCI Machine Learning Repository. There are similar datasets on Kaggle but this is more comprehensive. It serves to show a basic trend in the house pricing in terms of its location, the area of construction, its interior, etc.

    Content

    The dataset contains 21613x22 data fields. Column names are self-explanatory.

    Inspiration

    • What are the busiest times of the year to visit Kansas? By how much do prices spike?
    • Is there a general upward trend of prices and on what factors contribute the most?
    • Exploratory Data Analysis
    • Price analysis by area based on Lat and Long it

    Column Names

    id - Unique ID for each home sold

    date - Date of the home sale

    price - Price of each home sold

    bedrooms - Number of bedrooms

    bathrooms - Number of bathrooms, where .5 accounts for a room with a toilet but no shower

    sqft_living - Square footage of the apartment interior living space

    sqft_lot - Square footage of the land space

    floors - Number of floors

    waterfront - A dummy variable for whether the apartment was overlooking the waterfront or not

    view - An index from 0 to 4 of how good the view of the property was

    condition - An index from 1 to 5 on the condition of the apartment,

    grade - An index from 1 to 13, where 1-3 falls short of building construction and design, 7 has an average level of construction and design, and 11-13 have a high-quality level of construction and design.

    sqft_above - The square footage of the interior housing space that is above ground level

    sqft_basement - The square footage of the interior housing space that is below ground level

    yr_built - The year the house was initially built

    yr_renovated - The year of the house’s last renovation

    zipcode - What zipcode area the house is in

    lat - Lattitude

    long - Longitude

    sqft_living15 - The square footage of interior housing living space for the nearest 15 neighbors

    sqft_lot15 - The square footage of the land lots of the nearest 15 neighbors

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
asaniczka (2024). US Cost of Living Dataset (1877 Counties) [Dataset]. https://www.kaggle.com/datasets/asaniczka/us-cost-of-living-dataset-3171-counties
Organization logo

US Cost of Living Dataset (1877 Counties)

Modest yet adequate family budget estimates for 1877 US counties

Explore at:
zip(1282159 bytes)Available download formats
Dataset updated
Feb 17, 2024
Authors
asaniczka
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered
United States
Description

The US Family Budget Dataset provides insights into the cost of living in different US counties based on the Family Budget Calculator by the Economic Policy Institute (EPI).

This dataset offers community-specific estimates for ten family types, including one or two adults with zero to four children, in all 1877 counties and metro areas across the United States.

Interesting Task Ideas:

  1. See how family budgets compare to the federal poverty line and the Supplemental Poverty Measure in different counties.
  2. Look into the money challenges faced by different types of families using the budgets provided.
  3. Find out which counties have the most affordable places to live, food, transportation, healthcare, childcare, and other things people need.
  4. Explore how the average income of families relates to the overall cost of living in different counties.
  5. Investigate how family size affects the estimated budget and find counties where bigger families have higher costs.
  6. Create visuals showing how the cost of living varies across different states and big cities.
  7. Check whether specific counties are affordable for families of different sizes and types.
  8. Use the dataset to compare living standards and economic security in different US counties.

If you find this dataset valuable, don't forget to hit the upvote button! 😊💝

Checkout my other datasets

Employment-to-Population Ratio for USA

Productivity and Hourly Compensation

130K Kindle Books

900K TMDb Movies

USA Unemployment Rates by Demographics & Race

Photo by Alev Takil on Unsplash

Search
Clear search
Close search
Google apps
Main menu