Median house prices for California districts derived from the 1990 census.
About Dataset
Context This is the dataset used in the second chapter of Aurélien Géron's recent book 'Hands-On Machine learning with Scikit-Learn and TensorFlow'. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being to toyish and too cumbersome.
The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning.
Content The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. Be warned the data aren't cleaned so there are some preprocessing steps required! The columns are as follows, their names are pretty self-explanatory: - longitude - latitude - housing_median_age - total_rooms - total_bedrooms - population - households - median_income - median_house_value - ocean_proximity
Acknowledgements This data was initially featured in the following paper: Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.
and I encountered it in 'Hands-On Machine learning with Scikit-Learn and TensorFlow' by Aurélien Géron. Aurélien Géron wrote: This dataset is a modified version of the California Housing dataset available from: Luís Torgo's page (University of Porto)
Inspiration See my kernel on machine learning basics in R using this dataset, or venture over to the following link for a python based introductory tutorial: https://github.com/ageron/handson-ml/tree/master/datasets/housing
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Data on resident owners who are persons occupying one of their residential properties: sex, age, total income, the type and the assessment value of the owner-occupied property, as well as the number and the total assessment value of residential properties owned.
Data on resident owners who are persons occupying one of their residential properties: immigration characteristics (immigration status, period of immigration, admission category, place of birth), age, total family income, the number and the total assessment value of residential properties owned.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This table is part of a series of tables that present a portrait of Canada based on the various census topics. The tables range in complexity and levels of geography. Content varies from a simple overview of the country to complex cross-tabulations; the tables may also cover several censuses.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Data on resident buyers who are persons that purchased a residential property in a market sale and filed their T1 tax return form: number of and incomes of residential property buyers, sale price, price-to-income ratio by the number of buyers as part of a sale, age groups, first-time home buyer status, buyer characteristics (sex, family type, immigration status, period of immigration, admission category).
This dataset contains unemployment rates for the U.S.(1948 - Present) and California (1976 - Present). The unemployment rate represents the number of unemployed as a percentage of the labor force. Labor force data are restricted to people 16 years of age and older, who currently reside in 1 of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces. This rate is also defined as the U-3 measure of labor underutilization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Hollister, CA, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/hollister-ca-median-household-income-by-household-size.jpeg" alt="Hollister, CA median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Hollister median household income. You can refer the same here
Housing under construction in Canada 2006-2023 Published by Fernando de Querol Cumbrera, Jun 7, 2024 In 2023, there were approximately 354,730 housing units under construction in population centers of over 10,000 people in Canada. Those numbers for 2023 were lower than the figures a year earlier. However, during most of the period considered the number of homes under construction generally increased, as there were less than 145,000 homes under construction in 2009. Development of residential construction in Canada The number of housing starts overall has developed similarly, but it was somewhat more volatile. Generally, the cities with the largest populations, like Toronto and Montreal experience the highest number of construction starts. The construction industry remains vital to Canada’s economy, providing employment to people across the country and billions in economic output. Types of housing Although many Canadians were living in single-detached houses, their cities have started to embrace multifamily starts quite early and are moving away from single family residential construction. The younger demographics in Canada, including new Canadians and young families are often striving towards homeownership but are also aware of climate change. High energy performance housing often comes with a higher price tag, but efforts are being made nationally in order to enshrine affordability as a core objective within national building codes.
This dataset contains counts of deaths for California as a whole based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in California regardless of the place of residence (by occurrence) and deaths to California residents (by residence), whereas the provisional data table only includes deaths that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
This dataset contains two tables on the percent of household overcrowding (> 1.0 persons per room) and severe overcrowding (> 1.5 persons per room) for California, its regions, counties, and cities/towns. Data is from the U.S. Department of Housing and Urban Development (HUD), Comprehensive Housing Affordability Strategy (CHAS) and U.S. Census American Community Survey (ACS). The table is part of a series of indicators in the Healthy Communities Data and Indicators Project (HCI) of the Office of Health Equity: Healthy Communities Data and Indicators Project of the Office of Health Equity. Residential crowding has been linked to an increased risk of infection from communicable diseases, a higher prevalence of respiratory ailments, and greater vulnerability to homelessness among the poor. Residential crowding reflects demographic and socioeconomic conditions. Older-adult immigrant and recent immigrant communities, families with low income and renter-occupied households are more likely to experience household crowding. A form of residential overcrowding known as "doubling up"—co-residence with family members or friends for economic reasons—is the most commonly reported prior living situation for families and individuals before the onset of homelessness. More information about the data table and a data dictionary can be found in the About/Attachments section.The household crowding table is part of a series of indicators in the Healthy Communities Data and Indicators Project (HCI) of the Office of Health Equity. The goal of HCI is to enhance public health by providing data, a standardized set of statistical measures, and tools that a broad array of sectors can use for planning healthy communities and evaluating the impact of plans, projects, policy, and environmental changes on community health. The creation of healthy social, economic, and physical environments that promote healthy behaviors and healthy outcomes requires coordination and collaboration across multiple sectors, including transportation, housing, education, agriculture and others. Statistical metrics, or indicators, are needed to help local, regional, and state public health and partner agencies assess community environments and plan for healthy communities that optimize public health. More information on HCI can be found here: https://www.cdph.ca.gov/Programs/OHE/CDPH%20Document%20Library/Accessible%202%20CDPH_Healthy_Community_Indicators1pager5-16-12.pdf
The format of the household overcrowding tables is based on the standardized data format for all HCI indicators. As a result, this data table contains certain variables used in the HCI project (e.g., indicator ID, and indicator definition). Some of these variables may contain the same value for all observations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents a breakdown of households across various income brackets in San Marino, CA, as reported by the U.S. Census Bureau. The Census Bureau classifies households into different categories, including total households, family households, and non-family households. Our analysis of U.S. Census Bureau American Community Survey data for San Marino, CA reveals how household income distribution varies among these categories. The dataset highlights the variation in number of households with income, offering valuable insights into the distribution of San Marino households based on income levels.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for San Marino median household income. You can refer the same here
Data on the number of residential property owners and their assessment value by ownership type, residency status and number of properties owned. As well as data on the number of resident buyers of properties sold in a market and a non-market sale, during the previous reference period, and data on the sale price of those properties sold in a market sale.
The feature set indicates the locations, and tenant characteristics of public housing development buildings for the San Francisco Bay Region. This feature set, extracted by the Metropolitan Transportation Commission, is from the statewide public housing buildings feature layer provided by the California Department of Housing and Community Development (HCD). HCD itself extracted the California data from the United States Department of Housing and Urban Development (HUD) feature service depicting the location of individual buildings within public housing units throughout the United States.
According to HUD's Public Housing Program, "Public Housing was established to provide decent and safe rental housing for eligible low-income families, the elderly, and persons with disabilities. Public housing comes in all sizes and types, from scattered single family houses to high-rise apartments for elderly families. There are approximately 1.2 million households living in public housing units, managed by some 3,300 housing agencies. HUD administers federal aid to local housing agencies that manage the housing for low-income residents at rents they can afford. HUD furnishes technical and professional assistance in planning, developing and managing these developments.
HUD administers Federal aid to local Housing Agencies (HAs) that manage housing for low-income residents at rents they can afford. Likewise, HUD furnishes technical and professional assistance in planning, developing, and managing the buildings that comprise low-income housing developments. This feature set provides the location, and resident characteristics of public housing development buildings.
Location data for HUD-related properties and facilities are derived from HUD's enterprise geocoding service. While not all addresses are able to be geocoded and mapped to 100% accuracy, we are continuously working to improve address data quality and enhance coverage. Please consider this issue when using any datasets provided by HUD. When using this data, take note of the field titled “LVL2KX” which indicates the overall accuracy of the geocoded address using the following return codes:
‘R’ - Interpolated rooftop (high degree of accuracy, symbolized as green)
‘4’ - ZIP+4 centroid (high degree of accuracy, symbolized as green)
‘B’ - Block group centroid (medium degree of accuracy, symbolized as yellow)
‘T’ - Census tract centroid (low degree of accuracy, symbolized as red)
‘2’ - ZIP+2 centroid (low degree of accuracy, symbolized as red)
‘Z’ - ZIP5 centroid (low degree of accuracy, symbolized as red)
‘5’ - ZIP5 centroid (same as above, low degree of accuracy, symbolized as red)
Null - Could not be geocoded (does not appear on the map)
For the purposes of displaying the location of an address on a map only use addresses and their associated lat/long coordinates where the LVL2KX field is coded ‘R’ or ‘4’. These codes ensure that the address is displayed on the correct street segment and in the correct census block. The remaining LVL2KX codes provide a cascading indication of the most granular level geography for which an address can be confirmed. For example, if an address cannot be accurately interpolated to a rooftop (‘R’), or ZIP+4 centroid (‘4’), then the address will be mapped to the centroid of the next nearest confirmed geography: block group, tract, and so on. When performing any point-in polygon analysis it is important to note that points mapped to the centroids of larger geographies will be less likely to map accurately to the smaller geographies of the same area. For instance, a point coded as ‘5’ in the correct ZIP Code will be less likely to map to the correct block group or census tract for that address. In an effort to protect Personally Identifiable Information, the characteristics for each building are suppressed with a -4 value when the “Number_Reported” is equal to, or less than 10.
HCD downloaded the HUD data in April 2021. They sourced the data from https://hub.arcgis.com/datasets/fedmaps::public-housing-buildings.
To learn more about Public Housing visit: https://www.hud.gov/program_offices/public_indian_housing/programs/ph/.
VITAL SIGNS INDICATOR List Rents (EC9)
FULL MEASURE NAME List Rents
LAST UPDATED October 2016
DESCRIPTION List rent refers to the advertised rents for available rental housing and serves as a measure of housing costs for new households moving into a neighborhood, city, county or region.
DATA SOURCE real Answers (1994 – 2015) no link
Zillow Metro Median Listing Price All Homes (2010-2016) http://www.zillow.com/research/data/
CONTACT INFORMATION vitalsigns.info@mtc.ca.gov
METHODOLOGY NOTES (across all datasets for this indicator) List rents data reflects median rent prices advertised for available apartments rather than median rent payments; more information is available in the indicator definition above. Regional and local geographies rely on data collected by real Answers, a research organization and database publisher specializing in the multifamily housing market. real Answers focuses on collecting longitudinal data for individual rental properties through quarterly surveys. For the Bay Area, their database is comprised of properties with 40 to 3,000+ housing units. Median list prices most likely have an upward bias due to the exclusion of smaller properties. The bias may be most extreme in geographies where large rental properties represent a small portion of the overall rental market. A map of the individual properties surveyed is included in the Local Focus section.
Individual properties surveyed provided lower- and upper-bound ranges for the various types of housing available (studio, 1 bedroom, 2 bedroom, etc.). Median lower- and upper-bound prices are determined across all housing types for the regional and county geographies. The median list price represented in Vital Signs is the average of the median lower- and upper-bound prices for the region and counties. Median upper-bound prices are determined across all housing types for the city geographies. The median list price represented in Vital Signs is the median upper-bound price for cities. For simplicity, only the mean list rent is displayed for the individual properties. The metro areas geography rely upon Zillow data, which is the median price for rentals listed through www.zillow.com during the month. Like the real Answers data, Zillow's median list prices most likely have an upward bias since small properties are underrepresented in Zillow's listings. The metro area data for the Bay Area cannot be compared to the regional Bay Area data. Due to afore mentioned data limitations, this data is suitable for analyzing the change in list rents over time but not necessarily comparisons of absolute list rents. Metro area boundaries reflects today’s metro area definitions by county for consistency, rather than historical metro area boundaries.
Due to the limited number of rental properties surveyed, city-level data is unavailable for Atherton, Belvedere, Brisbane, Calistoga, Clayton, Cloverdale, Cotati, Fairfax, Half Moon Bay, Healdsburg, Hillsborough, Los Altos Hills, Monte Sereno, Moranga, Oakley, Orinda, Portola Valley, Rio Vista, Ross, San Anselmo, San Carlos, Saratoga, Sebastopol, Windsor, Woodside, and Yountville.
Inflation-adjusted data are presented to illustrate how rents have grown relative to overall price increases; that said, the use of the Consumer Price Index does create some challenges given the fact that housing represents a major chunk of consumer goods bundle used to calculate CPI. This reflects a methodological tradeoff between precision and accuracy and is a common concern when working with any commodity that is a major component of CPI itself. Percent change in inflation-adjusted median is calculated with respect to the median price from the fourth quarter or December of the base year.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This table contains data described by the following dimensions (Not all combinations are available): Geography (13 items: Canada; Newfoundland and Labrador; Prince Edward Island; Atlantic provinces ...), Housing estimates (3 items: Housing starts; Housing under construction; Housing completions ...), Type of unit (6 items: Total units; Multiples; Single-detached; Semi-detached ...), Seasonal adjustment (2 items: Unadjusted; Seasonally adjusted at annual rates ...).
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Data on the number of residential properties sold, sale price and number of buyers by sale type, property type and period of construction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Assisted Living Waiver (ALW) eligible individuals are those who are enrolled in Medi-Cal and meet the level of care provided in a nursing facility due to their medical needs. Individuals with Medi-Cal benefits that include a share of cost may not enroll in the ALW. This dataset contains the provider number, provider legal name, provider business name, capacity per provider enrollment, provider physical location, provider counties and provider phone number of facilities enrolled in the ALW program. Data as of 1/1/2023
https://borealisdata.ca/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.5683/SP/8OVPSZhttps://borealisdata.ca/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.5683/SP/8OVPSZ
The Ontario Homeownership Index takes the pulse of Ontarians' perceptions of home ownership and real estate markets at the neighbourhood, city, and provincial level. Developed in partnership with the Ontario Real Estate Association (OREA), the Index gathers Ontarians' perceptions on the state of the economy, as well as their current housing situation and readiness to buy, sell, or rent. Respondents are queried on related topics such as their preferred type of home to purchase, perceptions on ownership, and opinions on realtors and the real estate profession. This survey has a sample of 1080 respondents and was performed online, May 20-23, 2014.
See "Supportive Housing Locations Readme" file This data set contains the locations of City operated supportive housing locations within Toronto. The data was obtained from the Long-Term Care Homes and Services Division in a list format. Data is used for business operation to support Long-Term Care Homes and Services staff at each of these facilities. The data set is updated as a new facility is created or one is no longer in use. Statistical data may be updated as it is made available. The dataset is a City-wide GIS point file and is best utilized when overlaid with City's Street Centreline file.
On an annual basis (calendar year), individual LTC facilities report facility-level data on services capacity, utilization, patients, and capital/equipment expenditures.
Median house prices for California districts derived from the 1990 census.
About Dataset
Context This is the dataset used in the second chapter of Aurélien Géron's recent book 'Hands-On Machine learning with Scikit-Learn and TensorFlow'. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being to toyish and too cumbersome.
The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning.
Content The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. Be warned the data aren't cleaned so there are some preprocessing steps required! The columns are as follows, their names are pretty self-explanatory: - longitude - latitude - housing_median_age - total_rooms - total_bedrooms - population - households - median_income - median_house_value - ocean_proximity
Acknowledgements This data was initially featured in the following paper: Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.
and I encountered it in 'Hands-On Machine learning with Scikit-Learn and TensorFlow' by Aurélien Géron. Aurélien Géron wrote: This dataset is a modified version of the California Housing dataset available from: Luís Torgo's page (University of Porto)
Inspiration See my kernel on machine learning basics in R using this dataset, or venture over to the following link for a python based introductory tutorial: https://github.com/ageron/handson-ml/tree/master/datasets/housing