Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘California Housing Prices Data (5 new features!)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Boston House Prices: LINK
This is the dataset is a modified version of the California Housing Data used in the paper Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.
. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being too toyish and too cumbersome.
The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning.
This dataset includes 5 extra features defined by me: "Distance to coast", "Distance to Los Angeles", "Distance to San Diego", "Distance to San Jose", and "Distance to San Francisco". These extra features try to account for the distance to the nearest coast and the distance to the centre of the largest cities in California.
The distances were calculated using the Haversine formula with the Longitude and Latitude:
https://wikimedia.org/api/rest_v1/media/math/render/svg/a65dbbde43ff45bacd2505fcf32b44fc7dcd8cc0" alt="">
where:
phi_1
and phi_2
are the Latitudes of point 1 and point 2, respectivelylambda_1
and lambda_2
are the Longitudes of point 1 and point 2, respectivelyr
is the radius of the Earth (6371km)The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. The columns are as follows, their names are pretty self-explanatory:
1) Median House Value: Median house value for households within a block (measured in US Dollars) [$] 2) Median Income: Median income for households within a block of houses (measured in tens of thousands of US Dollars) [10k$] 3) Median Age: Median age of a house within a block; a lower number is a newer building [years] 4) Total Rooms: Total number of rooms within a block 5) Total Bedrooms: Total number of bedrooms within a block 6) Population: Total number of people residing within a block 7) Households: Total number of households, a group of people residing within a home unit, for a block 8) Latitude: A measure of how far north a house is; a higher value is farther north [°] 9) Longitude: A measure of how far west a house is; a higher value is farther west [°] 10) Distance to coast: Distance to the nearest coast point [m] 11) Distance to Los Angeles: Distance to the centre of Los Angeles [m] 12) Distance to San Diego: Distance to the centre of San Diego [m] 13) Distance to San Jose: Distance to the centre of San Jose [m] 14) Distance to San Francisco: Distance to the centre of San Francisco [m]
This data was entirely modified and cleaned by me. The original data (without the distance features) was initially featured in the following paper: Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.
The original dataset can be found under the following link: https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html
--- Original source retains full ownership of the source dataset ---
The number of U.S. home sales in the United States declined in 2024, after soaring in 2021. A total of four million transactions of existing homes, including single-family, condo, and co-ops, were completed in 2024, down from 6.12 million in 2021. According to the forecast, the housing market is forecast to head for recovery in 2025, despite transaction volumes expected to remain below the long-term average. Why have home sales declined? The housing boom during the coronavirus pandemic has demonstrated that being a homeowner is still an integral part of the American dream. Nevertheless, sentiment declined in the second half of 2022 and Americans across all generations agreed that the time was not right to buy a home. A combination of factors has led to house prices rocketing and making homeownership unaffordable for the average buyer. A survey among owners and renters found that the high home prices and unfavorable economic conditions were the two main barriers to making a home purchase. People who would like to purchase their own home need to save up a deposit, have a good credit score, and a steady and sufficient income to be approved for a mortgage. In 2022, mortgage rates experienced the most aggressive increase in history, making the total cost of homeownership substantially higher. Are U.S. home prices expected to fall? The median sales price of existing homes stood at 413,000 U.S. dollars in 2024 and was forecast to increase slightly until 2026. The development of the S&P/Case Shiller U.S. National Home Price Index shows that home prices experienced seven consecutive months of decline between June 2022 and January 2023, but this trend reversed in the following months. Despite mild fluctuations throughout the year, home prices in many metros are forecast to continue to grow, albeit at a much slower rate.
In 2022, San Mateo, San Francisco, and Santa Clara were the most expensive districts for housing in the San Francisco Bay Area. In San Francisco, the average square footage price of single-family homes exceeded 1,000 U.S. dollars per square foot. Housing in Solano, on the other hand, was most affordable, with the average square footage price for single family homes at 370 U.S. dollars.
How expensive is buying a home in San Francisco? Few metros in the U.S. are more expensive than San Francisco, CA. In 2022, the median sales price of existing single-family homes in San Francisco was about 1.4 million U.S. dollars, making it the second priciest market in the U.S. House prices in the Golden City, were not always so high: in 2014, a two-bedroom house in the Bay Area would sell for less than 500,000 U.S. dollars but since then, the median price has more than doubled.
How much does renting an apartment cost? Despite rents falling in 2020, renting in San Francisco is still far from cheap. Renting a two-bedroom apartment cost close to 4,000 U.S. dollars in 2021. California is one of the least affordable states for renters. In fact, to afford to rent such an apartment, a household needs approximately three full time jobs at minimum wage or two full time jobs at mean wage.
West Virginia and Kansas had the lowest cost of living across all U.S. states, with composite costs being half of those found in Hawaii. This was according to a composite index that compares prices for various goods and services on a state-by-state basis. In West Virginia, the cost of living index amounted to **** — well below the national benchmark of 100. Virginia— which had an index value of ***** — was only slightly above that benchmark. Expensive places to live included Hawaii, Massachusetts, and California. Housing costs in the U.S. Housing is usually the highest expense in a household’s budget. In 2023, the average house sold for approximately ******* U.S. dollars, but house prices in the Northeast and West regions were significantly higher. Conversely, the South had some of the least expensive housing. In West Virginia, Mississippi, and Louisiana, the median price of the typical single-family home was less than ******* U.S. dollars. That makes living expenses in these states significantly lower than in states such as Hawaii and California, where housing is much pricier. What other expenses affect the cost of living? Utility costs such as electricity, natural gas, water, and internet also influence the cost of living. In Alaska, Hawaii, and Connecticut, the average monthly utility cost exceeded *** U.S. dollars. That was because of the significantly higher prices for electricity and natural gas in these states.
The average monthly rent in California for a two-bedroom apartment was 2,464 U.S. dollars in 2024, while a one-bedroom unit cost 1,989 U.S. dollars. Only renters who earn the area median income (AMI) can afford two-bedroom housing in California. Rent affordable to renters with full-time jobs at mean renter wage, or 30 percent area median income, was lower than the fair market rent of a two-bedroom and one-bedroom apartment in California, making this housing in this state not affordable for them. The rent in California ranked highest among all other states in the United States for a two bedroom apartment in 2024.
In California, the estimated fair market rent for a two-bedroom accommodation amounted to 2,464 U.S. dollars in 2024. It was one of the least affordable states in terms of housing that year, as someone would need to earn at least twice the minimum wage to afford a two-bedroom rental unit there.
In the first quarter of 2025, San Francisco, Chicago, New York, and Honolulu were some of the U.S. cities with the highest housing construction costs. Meanwhile, Phoenix had one of the lowest construction costs for high-end multifamily homes at *** U.S. dollars per square foot and Las Vegas for single-family homes between *** and *** U.S. dollars per square foot. Construction cost disparities As seen here, the construction cost for a high-end multi-family home in San Francisco in the first quarter of 2024 was over ***** more expensive than in Phoenix. Meanwhile, there were also great differences in the cost of building a single-family house in New York and in Portland or Seattle. Some factors that may cause these disparities are the construction materials, installation, and composite costs, differing land values, wages, etc. For example, although the price of construction materials in the U.S. was rising at a slower level than in 2022 and 2023, several materials that are essential in most construction projects had growth rates of over **** percent in 2024. Growing industry revenue Despite the economic uncertainty and other challenges, the size of the private construction market in the U.S. rose during the past years. It is important to consider that supply and demand for housing influences the revenue of this segment of the construction market. On the supply side, single-family home construction fell in 2023, but it is expected to rise in 2024 and 2025. On the demand side, some of the U.S. metropolitan areas with the highest sale prices of single-family homes were located in California, with San Jose-Sunnyvale-Santa Clara at the top of the ranking.
In 2024, the average value of U.S. farm real estate was 4,170 U.S. dollars per acre. Compared to one decade earlier, the value has increased by almost 40 percent. Generally, the value of U.S. farm real estate has had an upward trend since 1970. U.S. farms The number of farms in the United States has conversely been decreasing each year, reaching about two million farms as of 2022. That year, Texas had the most farms out of any other U.S. state by far, with about 246,000 farms. Missouri and Iowa had the second and third most farms, though neither state exceeded 100,000 farms. Agricultural trade Agricultural products encompass any products from agricultural origin that are meant for human consumption or animal feed. Agricultural products can include livestock products or crops. In 2022, the U.S. exported about 196.4 billion U.S. dollars’ worth of agricultural goods worldwide, increasing from the previous several years. Mexico is a key destination for U.S. agricultural products and imported just over 28 billion dollars’ worth in 2022, more than Europe and Eurasia combined.
The average monthly rent for all apartment types in the U.S. soared in 2021 and 2022, followed by a slight decline in the next two years. In April 2025, the monthly rent for a two-bedroom apartment amounting to ***** U.S. dollars. That was an increase from ***** U.S. dollars in January 2021, but a decline from the peak value of ***** U.S. dollars in August 2022. Where are the most expensive apartments in the U.S.? Apartment rents vary widely from state to state. To afford a two-bedroom apartment in California, for example, a renter needed to earn an average hourly wage of nearly ** U.S. dollars, which was approximately double the average wage in North Carolina and three times as much as the average wage in Arkansas. In fact, rental costs were considerably higher than the hourly minimum wage in all U.S. states. How did rents change in different states in the U.S.? In 2024, some of the most expensive states to rent an apartment only saw a moderate increase in rental prices. Nevertheless, rents increased in most states as of April 2025. In West Virginia, the annual rental growth was the highest, at ***** percent.
Rents for industrial real estate in the U.S. have increased since 2017, with flexible/service space reaching the highest price per square foot in 2024. In just a year, the cost of, flex/service space rose by nearly *****U.S. dollars per square foot. Manufacturing facilities, warehouses, and distribution centers had lower rents and experienced milder growth. Los Angeles, Orange County, and Inland Empire, California, are some of the most expensive markets in the country. Office real estate is pricier Industrial real estate is far from being the most expensive commercial property type. For instance, average rental rates in major U.S. metros for office space are much higher than those for industrial space. This is most likely because office units are generally located in urban areas where there is limited space and thus higher demand, whereas industrial units are more suited to the outskirts of such urban areas. Industrial units, such as warehouses or factories, require much more space because they need to house large, heavy equipment or serve as a storage unit for future shipments. Big-box distribution space is gaining in importance Warehouses and distribution may currently command the lowest average rent per square foot among industrial space types, but the growing popularity of the asset class has earned it considerable gains over the past years. In 2021 and 2022, high occupier demand and insufficient supply led to soaring taking rent of big-box buildings. During that time, the vacancy rate of distribution centers fell below ****percent. The development of industrial and logistics facilities has accelerated since then, with the new supply coming to market, causing the vacancy rate to increase and the pressures on rent to ease.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘California Housing Prices Data (5 new features!)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Boston House Prices: LINK
This is the dataset is a modified version of the California Housing Data used in the paper Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.
. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being too toyish and too cumbersome.
The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning.
This dataset includes 5 extra features defined by me: "Distance to coast", "Distance to Los Angeles", "Distance to San Diego", "Distance to San Jose", and "Distance to San Francisco". These extra features try to account for the distance to the nearest coast and the distance to the centre of the largest cities in California.
The distances were calculated using the Haversine formula with the Longitude and Latitude:
https://wikimedia.org/api/rest_v1/media/math/render/svg/a65dbbde43ff45bacd2505fcf32b44fc7dcd8cc0" alt="">
where:
phi_1
and phi_2
are the Latitudes of point 1 and point 2, respectivelylambda_1
and lambda_2
are the Longitudes of point 1 and point 2, respectivelyr
is the radius of the Earth (6371km)The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. The columns are as follows, their names are pretty self-explanatory:
1) Median House Value: Median house value for households within a block (measured in US Dollars) [$] 2) Median Income: Median income for households within a block of houses (measured in tens of thousands of US Dollars) [10k$] 3) Median Age: Median age of a house within a block; a lower number is a newer building [years] 4) Total Rooms: Total number of rooms within a block 5) Total Bedrooms: Total number of bedrooms within a block 6) Population: Total number of people residing within a block 7) Households: Total number of households, a group of people residing within a home unit, for a block 8) Latitude: A measure of how far north a house is; a higher value is farther north [°] 9) Longitude: A measure of how far west a house is; a higher value is farther west [°] 10) Distance to coast: Distance to the nearest coast point [m] 11) Distance to Los Angeles: Distance to the centre of Los Angeles [m] 12) Distance to San Diego: Distance to the centre of San Diego [m] 13) Distance to San Jose: Distance to the centre of San Jose [m] 14) Distance to San Francisco: Distance to the centre of San Francisco [m]
This data was entirely modified and cleaned by me. The original data (without the distance features) was initially featured in the following paper: Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.
The original dataset can be found under the following link: https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html
--- Original source retains full ownership of the source dataset ---