63 datasets found
  1. A

    ‘California Housing Prices Data (5 new features!)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jul 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘California Housing Prices Data (5 new features!)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-california-housing-prices-data-5-new-features-230f/d4c4de7c/?iid=000-393&v=presentation
    Explore at:
    Dataset updated
    Jul 28, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California
    Description

    Analysis of ‘California Housing Prices Data (5 new features!)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Similar Datasets:

    Boston House Prices: LINK

    Context

    This is the dataset is a modified version of the California Housing Data used in the paper Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being too toyish and too cumbersome.

    The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning.

    Modifications with respect to the original data

    This dataset includes 5 extra features defined by me: "Distance to coast", "Distance to Los Angeles", "Distance to San Diego", "Distance to San Jose", and "Distance to San Francisco". These extra features try to account for the distance to the nearest coast and the distance to the centre of the largest cities in California.

    The distances were calculated using the Haversine formula with the Longitude and Latitude:

    https://wikimedia.org/api/rest_v1/media/math/render/svg/a65dbbde43ff45bacd2505fcf32b44fc7dcd8cc0" alt="">

    where:

    • phi_1 and phi_2 are the Latitudes of point 1 and point 2, respectively
    • lambda_1 and lambda_2 are the Longitudes of point 1 and point 2, respectively
    • r is the radius of the Earth (6371km)

    Content

    The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. The columns are as follows, their names are pretty self-explanatory:

    1) Median House Value: Median house value for households within a block (measured in US Dollars) [$] 2) Median Income: Median income for households within a block of houses (measured in tens of thousands of US Dollars) [10k$] 3) Median Age: Median age of a house within a block; a lower number is a newer building [years] 4) Total Rooms: Total number of rooms within a block 5) Total Bedrooms: Total number of bedrooms within a block 6) Population: Total number of people residing within a block 7) Households: Total number of households, a group of people residing within a home unit, for a block 8) Latitude: A measure of how far north a house is; a higher value is farther north [°] 9) Longitude: A measure of how far west a house is; a higher value is farther west [°] 10) Distance to coast: Distance to the nearest coast point [m] 11) Distance to Los Angeles: Distance to the centre of Los Angeles [m] 12) Distance to San Diego: Distance to the centre of San Diego [m] 13) Distance to San Jose: Distance to the centre of San Jose [m] 14) Distance to San Francisco: Distance to the centre of San Francisco [m]

    Source

    This data was entirely modified and cleaned by me. The original data (without the distance features) was initially featured in the following paper: Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.

    The original dataset can be found under the following link: https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

    --- Original source retains full ownership of the source dataset ---

  2. COVID-19 Vaccine Progress Dashboard Data by ZIP Code

    • data.ca.gov
    • data.chhs.ca.gov
    • +1more
    csv, xlsx, zip
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). COVID-19 Vaccine Progress Dashboard Data by ZIP Code [Dataset]. https://data.ca.gov/dataset/covid-19-vaccine-progress-dashboard-data-by-zip-code
    Explore at:
    csv, xlsx, zipAvailable download formats
    Dataset updated
    Jun 25, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.

    Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 12+ and age 5+ denominators have been uploaded as archived tables.

    Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021.

    This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data.

    This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score.

    This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4.

    The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting.

    These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.

    For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.

  3. W

    Low Income Population Concentration - Southern CA

    • wifire-data.sdsc.edu
    geotiff, wcs, wms
    Updated Mar 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Wildfire & Forest Resilience Task Force (2025). Low Income Population Concentration - Southern CA [Dataset]. https://wifire-data.sdsc.edu/dataset/clm-low-income-population-concentration-southern-ca
    Explore at:
    wms, geotiff, wcsAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    California Wildfire & Forest Resilience Task Force
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Southern California, California
    Description

    Relative concentration of the estimated number of people in the Southern California region that live in a household defined as "low income." There are multiple ways to define low income. These data apply the most common standard: low income population consists of all members of households that collectively have income less than twice the federal poverty threshold that applies to their household type. Household type refers to the household's resident composition: the number of independent adults plus dependents that can be of any age, from children to elderly. For example, a household with four people ' one working adult parent and three dependent children ' has a different poverty threshold than a household comprised of four unrelated independent adults.

    Due to high estimate uncertainty for many block group estimates of the number of people living in low income households, some records cannot be reliably assigned a class and class code comparable to those assigned to race/ethnicity data from the decennial Census.

    "Relative concentration" is a measure that compares the proportion of population within each Census block group data unit to the proportion of all people that live within the 13,312 block groups in the Southern California RRK region. See the "Data Units" description below for how these relative concentrations are broken into categories in this "low income" metric.

  4. W

    Asian Population Concentration - Southern CA

    • wifire-data.sdsc.edu
    geotiff, wcs, wms
    Updated Mar 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Wildfire & Forest Resilience Task Force (2025). Asian Population Concentration - Southern CA [Dataset]. https://wifire-data.sdsc.edu/dataset/clm-asian-population-concentration-southern-ca
    Explore at:
    wcs, geotiff, wmsAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    California Wildfire & Forest Resilience Task Force
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Southern California, California
    Description

    Relative concentration of the Southern California region's Asian American population. The variable ASIANALN records all individuals who select Asian as their SOLE racial identity in response to the Census questionnaire, regardless of their response to the Hispanic ethnicity question. Both Hispanic and non-Hispanic in the Census questionnaire are potentially associated with the Asian race alone.

    "Relative concentration" is a measure that compares the proportion of population within each Census block group data unit that identify as ASIANALN alone to the proportion of all people that live within the 13,312 block groups in the Southern California RRK region that identify as ASIANALN alone. Example: if 5.2% of people in a block group identify as HSPBIPOC, the block group has twice the proportion of ASIANALN individuals compared to the Southern California RRK region (2.6%), and more than three times the proportion compared to the entire state of California (1.6%). If the local proportion is twice the regional proportion, then ASIANALN individuals are highly concentrated locally.

  5. C

    Death Profiles by County

    • data.chhs.ca.gov
    • data.ca.gov
    • +3more
    csv, zip
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). Death Profiles by County [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-county
    Explore at:
    csv(28125832), csv(60517511), csv(75015194), csv(60201673), csv(60676655), csv(74351424), csv(52019564), csv(60023260), csv(74689382), csv(51592721), csv(73906266), csv(15127221), csv(1128641), csv(5095), csv(11738570), zip, csv(74043128), csv(24235858), csv(74497014), csv(21575405)Available download formats
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    California Department of Public Health
    Description

    This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

    The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

    The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.

  6. g

    University of Southern California (USC) Understanding America Study

    • gimi9.com
    • catalog.data.gov
    • +1more
    Updated Apr 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). University of Southern California (USC) Understanding America Study [Dataset]. https://gimi9.com/dataset/data-gov_university-of-southern-california-usc-understanding-america-study/
    Explore at:
    Dataset updated
    Apr 2, 2025
    Area covered
    United States, Southern California, California
    Description

    "The Social Security Administration (SSA) suggested to USC to survey members of the public around these topics: What do people know about Social Security? How do people learn about Social Security and how do they want to learn about Social Security? How do adults use financial products as they age? How do adults make their financial decisions and where do they turn for advice? What are adults' main sources of financial stress? The results of the survey are available at the USC website below after logging in and being granted access by USC."

  7. W

    Black and African American Population Concentration - Southern CA

    • wifire-data.sdsc.edu
    geotiff, wcs, wms
    Updated Mar 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Wildfire & Forest Resilience Task Force (2025). Black and African American Population Concentration - Southern CA [Dataset]. https://wifire-data.sdsc.edu/dataset/clm-black-and-african-american-population-concentration-southern-ca
    Explore at:
    geotiff, wms, wcsAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    California Wildfire & Forest Resilience Task Force
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Africa, Southern California, California
    Description

    Relative concentration of the Southern California region's Black/African American population. The variable BLACKALN records all individuals who select black or African American as their SOLE racial identity in response to the Census questionnaire, regardless of their response to the Hispanic ethnicity question. Both Hispanic and non-Hispanic in the Census questionnaire are potentially associated with black race alone.

    "Relative concentration" is a measure that compares the proportion of population within each Census block group data unit that identify as Black/African American alone to the proportion of all people that live within the 13,312 block groups in the Southern California RRK region that identify as Black/African American alone. Example: if 5.2% of people in a block group identify as BLACKALN, the block group has twice the proportion of BLACKALN individuals compared to the Southern California RRK region (2.6%), and more than three times the proportion compared to the entire state of California (1.6%). If the local proportion is twice the regional proportion, then BLACKALN individuals are highly concentrated locally.

  8. N

    South Gate, CA Population Breakdown By Race (Excluding Ethnicity) Dataset:...

    • neilsberg.com
    csv, json
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). South Gate, CA Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/south-gate-ca-population-by-race/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Gate, California
    Variables measured
    Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of South Gate by race. It includes the population of South Gate across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of South Gate across relevant racial categories.

    Key observations

    The percent distribution of South Gate population by race (across all racial categories recognized by the U.S. Census Bureau): 23.87% are white, 1.02% are Black or African American, 1.75% are American Indian and Alaska Native, 0.79% are Asian, 0.08% are Native Hawaiian and other Pacific Islander, 44.97% are some other race and 27.51% are multiracial.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (excluding ethnicity) for the South Gate
    • Population: The population of the racial category (excluding ethnicity) in the South Gate is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of South Gate total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Gate Population by Race & Ethnicity. You can refer the same here

  9. W

    Hispanic and Latino Population Concentration - Southern CA

    • wifire-data.sdsc.edu
    geotiff, wcs, wms
    Updated Mar 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Wildfire & Forest Resilience Task Force (2025). Hispanic and Latino Population Concentration - Southern CA [Dataset]. https://wifire-data.sdsc.edu/dataset/clm-hispanic-and-latino-population-concentration-southern-ca
    Explore at:
    wms, wcs, geotiffAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    California Wildfire & Forest Resilience Task Force
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Southern California, California
    Description

    Relative concentration of the Southern California region's Hispanic/Latino population. The variable HISPANIC records all individuals who select Hispanic or Latino in response to the Census questionnaire, regardless of their response to the racial identity question.

    "Relative concentration" is a measure that compares the proportion of population within each Census block group data unit that identify as American Indian / Alaska Native alone to the proportion of all people that live within the 13,312 block groups in the Southern California RRK region that identify as American Indian / Alaska native alone. Example: if 5.2% of people in a block group identify as HISPANIC, the block group has twice the proportion of HISPANIC individuals compared to the Southern California RRK region (2.6%), and more than three times the proportion compared to the entire state of California (1.6%). If the local proportion is twice the regional proportion, then HISPANIC individuals are highly concentrated locally.

  10. N

    Dataset for South Pasadena, CA Census Bureau Income Distribution by Gender

    • neilsberg.com
    Updated Jan 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Dataset for South Pasadena, CA Census Bureau Income Distribution by Gender [Dataset]. https://www.neilsberg.com/research/datasets/b3d3f0b5-abcb-11ee-8b96-3860777c1fe6/
    Explore at:
    Dataset updated
    Jan 9, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Pasadena, California
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the South Pasadena household income by gender. The dataset can be utilized to understand the gender-based income distribution of South Pasadena income.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • South Pasadena, CA annual median income by work experience and sex dataset : Aged 15+, 2010-2022 (in 2022 inflation-adjusted dollars)
    • South Pasadena, CA annual income distribution by work experience and gender dataset (Number of individuals ages 15+ with income, 2021)

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of South Pasadena income distribution by gender. You can refer the same here

  11. W

    Hispanic and or Black, Indigenous or People of Color (Hspbipoc) Population...

    • wifire-data.sdsc.edu
    geotiff, wcs, wms
    Updated Mar 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Wildfire & Forest Resilience Task Force (2025). Hispanic and or Black, Indigenous or People of Color (Hspbipoc) Population Concentration - Southern CA [Dataset]. https://wifire-data.sdsc.edu/dataset/clm-hispanic-and-or-black-indigenous-or-people-of-color-hspbipoc-population-concentration-southern-c
    Explore at:
    wms, wcs, geotiffAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    California Wildfire & Forest Resilience Task Force
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Southern California, California
    Description

    Relative concentration of the Southern California region's Black/African American population. The variable HSPBIPOC is equivalent to all individuals who select a combination of racial and ethnic identity in response to the Census questionnaire EXCEPT those who select "not Hispanic" for the ethnic identity question, and "white race alone" for the racial identity question. This is the most encompassing possible definition of racial and ethnic identities that may be associated with historic underservice by agencies, or be more likely to express environmental justice concerns (as compared to predominantly non-Hispanic white communities). Until 2021, federal agency guidance for considering environmental justice impacts of proposed actions focused on how the actions affected "racial or ethnic minorities." "Racial minority" is an increasingly meaningless concept in the USA, and particularly so in California, where only about 3/8 of the state's population identifies as non-Hispanic and white race alone - a clear majority of Californians identify as Hispanic and/or not white. Because many federal and state map screening tools continue to rely on "minority population" as an indicator for flagging potentially vulnerable / disadvantaged/ underserved populations, our analysis includes the variable HSPBIPOC which is effectively "all minority" population according to the now outdated federal environmental justice direction. A more meaningful analysis for the potential impact of forest management actions on specific populations considers racial or ethnic populations individually: e.g., all people identifying as Hispanic regardless of race; all people identifying as American Indian, regardless of Hispanic ethnicity; etc.

    "Relative concentration" is a measure that compares the proportion of population within each Census block group data unit that identify as HSPBIPOC alone to the proportion of all people that live within the 13,312 block groups in the Southern California RRK region that identify as HSPBIPOC alone. Example: if 5.2% of people in a block group identify as HSPBIPOC, the block group has twice the proportion of HSPBIPOC individuals compared to the Southern California RRK region (2.6%), and more than three times the proportion compared to the entire state of California (1.6%). If the local proportion is twice the regional proportion, then HSPBIPOC individuals are highly concentrated locally.

  12. COVID-19 Vaccine Progress Dashboard Data by ZIP Code

    • catalog.data.gov
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2024). COVID-19 Vaccine Progress Dashboard Data by ZIP Code [Dataset]. https://catalog.data.gov/dataset/covid-19-vaccine-progress-dashboard-data-by-zip-code-edd80
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    Description

    Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses. Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 12+ and age 5+ denominators have been uploaded as archived tables. Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021. This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data. This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score. This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4. The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting. These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons. For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.

  13. Live Birth Profiles by County

    • data.chhs.ca.gov
    • healthdata.gov
    • +3more
    csv, zip
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). Live Birth Profiles by County [Dataset]. https://data.chhs.ca.gov/dataset/live-birth-profiles-by-county
    Explore at:
    csv(1911), csv(8256822), csv(429423), zip, csv(9986780)Available download formats
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    Description

    This dataset contains counts of live births for California counties based on information entered on birth certificates. Final counts are derived from static data and include out of state births to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all births that occurred during the time period.

    The final data tables include both births that occurred in California regardless of the place of residence (by occurrence) and births to California residents (by residence), whereas the provisional data table only includes births that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by parent giving birth's age, parent giving birth's race-ethnicity, and birth place type. See temporal coverage for more information on which strata are available for which years.

  14. N

    South El Monte, CA annual income distribution by work experience and gender...

    • neilsberg.com
    csv, json
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). South El Monte, CA annual income distribution by work experience and gender dataset: Number of individuals ages 15+ with income, 2023 // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/south-el-monte-ca-income-by-gender/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 27, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South El Monte, California
    Variables measured
    Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time, Number of males working full time for a given income bracket, Number of males working part time for a given income bracket, Number of females working full time for a given income bracket, Number of females working part time for a given income bracket
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the number of individuals for both the genders (Male and Female), within each income bracket we conducted an initial analysis and categorization of the American Community Survey data. Households are categorized, and median incomes are reported based on the self-identified gender of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within South El Monte. The dataset can be utilized to gain insights into gender-based income distribution within the South El Monte population, aiding in data analysis and decision-making..

    Key observations

    • Employment patterns: Within South El Monte, among individuals aged 15 years and older with income, there were 6,388 men and 5,988 women in the workforce. Among them, 3,415 men were engaged in full-time, year-round employment, while 2,331 women were in full-time, year-round roles.
    • Annual income under $24,999: Of the male population working full-time, 10.83% fell within the income range of under $24,999, while 18.83% of the female population working full-time was represented in the same income bracket.
    • Annual income above $100,000: 7.17% of men in full-time roles earned incomes exceeding $100,000, while 5.41% of women in full-time positions earned within this income bracket.
    • Refer to the research insights for more key observations on more income brackets ( Annual income under $24,999, Annual income between $25,000 and $49,999, Annual income between $50,000 and $74,999, Annual income between $75,000 and $99,999 and Annual income above $100,000) and employment types (full-time year-round and part-time)
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Income brackets:

    • $1 to $2,499 or loss
    • $2,500 to $4,999
    • $5,000 to $7,499
    • $7,500 to $9,999
    • $10,000 to $12,499
    • $12,500 to $14,999
    • $15,000 to $17,499
    • $17,500 to $19,999
    • $20,000 to $22,499
    • $22,500 to $24,999
    • $25,000 to $29,999
    • $30,000 to $34,999
    • $35,000 to $39,999
    • $40,000 to $44,999
    • $45,000 to $49,999
    • $50,000 to $54,999
    • $55,000 to $64,999
    • $65,000 to $74,999
    • $75,000 to $99,999
    • $100,000 or more

    Variables / Data Columns

    • Income Bracket: This column showcases 20 income brackets ranging from $1 to $100,000+..
    • Full-Time Males: The count of males employed full-time year-round and earning within a specified income bracket
    • Part-Time Males: The count of males employed part-time and earning within a specified income bracket
    • Full-Time Females: The count of females employed full-time year-round and earning within a specified income bracket
    • Part-Time Females: The count of females employed part-time and earning within a specified income bracket

    Employment type classifications include:

    • Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.
    • Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South El Monte median household income by race. You can refer the same here

  15. N

    South Pasadena, CA annual income distribution by work experience and gender...

    • neilsberg.com
    csv, json
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). South Pasadena, CA annual income distribution by work experience and gender dataset: Number of individuals ages 15+ with income, 2023 // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/bac67c89-f4ce-11ef-8577-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 27, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Pasadena, California
    Variables measured
    Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time, Number of males working full time for a given income bracket, Number of males working part time for a given income bracket, Number of females working full time for a given income bracket, Number of females working part time for a given income bracket
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the number of individuals for both the genders (Male and Female), within each income bracket we conducted an initial analysis and categorization of the American Community Survey data. Households are categorized, and median incomes are reported based on the self-identified gender of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within South Pasadena. The dataset can be utilized to gain insights into gender-based income distribution within the South Pasadena population, aiding in data analysis and decision-making..

    Key observations

    • Employment patterns: Within South Pasadena, among individuals aged 15 years and older with income, there were 9,163 men and 9,252 women in the workforce. Among them, 5,959 men were engaged in full-time, year-round employment, while 5,405 women were in full-time, year-round roles.
    • Annual income under $24,999: Of the male population working full-time, 4.72% fell within the income range of under $24,999, while 6.90% of the female population working full-time was represented in the same income bracket.
    • Annual income above $100,000: 59.86% of men in full-time roles earned incomes exceeding $100,000, while 39.22% of women in full-time positions earned within this income bracket.
    • Refer to the research insights for more key observations on more income brackets ( Annual income under $24,999, Annual income between $25,000 and $49,999, Annual income between $50,000 and $74,999, Annual income between $75,000 and $99,999 and Annual income above $100,000) and employment types (full-time year-round and part-time)
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Income brackets:

    • $1 to $2,499 or loss
    • $2,500 to $4,999
    • $5,000 to $7,499
    • $7,500 to $9,999
    • $10,000 to $12,499
    • $12,500 to $14,999
    • $15,000 to $17,499
    • $17,500 to $19,999
    • $20,000 to $22,499
    • $22,500 to $24,999
    • $25,000 to $29,999
    • $30,000 to $34,999
    • $35,000 to $39,999
    • $40,000 to $44,999
    • $45,000 to $49,999
    • $50,000 to $54,999
    • $55,000 to $64,999
    • $65,000 to $74,999
    • $75,000 to $99,999
    • $100,000 or more

    Variables / Data Columns

    • Income Bracket: This column showcases 20 income brackets ranging from $1 to $100,000+..
    • Full-Time Males: The count of males employed full-time year-round and earning within a specified income bracket
    • Part-Time Males: The count of males employed part-time and earning within a specified income bracket
    • Full-Time Females: The count of females employed full-time year-round and earning within a specified income bracket
    • Part-Time Females: The count of females employed part-time and earning within a specified income bracket

    Employment type classifications include:

    • Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.
    • Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Pasadena median household income by race. You can refer the same here

  16. c

    20 Richest Counties in California

    • california-demographics.com
    Updated Jun 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristen Carney (2024). 20 Richest Counties in California [Dataset]. https://www.california-demographics.com/counties_by_population
    Explore at:
    Dataset updated
    Jun 20, 2024
    Dataset provided by
    Cubit Planning, Inc.
    Authors
    Kristen Carney
    License

    https://www.california-demographics.com/terms_and_conditionshttps://www.california-demographics.com/terms_and_conditions

    Area covered
    California
    Description

    A dataset listing California counties by population for 2024.

  17. d

    Data from: People With Dementia as Witnesses to Emotional Events in Southern...

    • catalog.data.gov
    • icpsr.umich.edu
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). People With Dementia as Witnesses to Emotional Events in Southern California, 2008-2009 [Dataset]. https://catalog.data.gov/dataset/people-with-dementia-as-witnesses-to-emotional-events-in-southern-california-2008-2009
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Institute of Justice
    Area covered
    California
    Description

    This study sought evidence that a subset of people with dementia (PwD) have reliable memory for emotional events in their own lives, and that they differ from PwD whose memory for emotional life events is less reliable or unreliable in respect to their own disease stage, confabulation and neuropsychiatric behaviors, and awareness of their cognitive impairment. A cross-sectional study of 93 people with mild or moderate dementia (aged 55 and older) and a comparison group of 50 older adults was conducted. Memories of recent autobiographical events that had both positive and negative emotional content were elicited during a structured interview, designed for consistency with accepted forensic interviewing techniques. Accurate recollection of these events was independently verified by a non-demented informant, usually a family member. In addition, both members of the dyad were interviewed independently to assess other characteristics of people with dementia (PwD): demographics, depressive symptoms, functional and cognitive abilities, medications, health conditions, behaviors and characteristics of the dyadic relationship. Researchers also assessed PwD for disease stage, awareness of cognitive impairment, and episodic memory. A validated test of emotionally-influenced memory was administered to qualified participants to verify the novel structured interviewing assessment developed for this study. Two researchers conducted the study assessments during home visits. The data file contains 945 cases and 732 variables.

  18. W

    American Indian or Alaska Native Race Alone and Multi-Race Population...

    • wifire-data.sdsc.edu
    geotiff, wcs, wms
    Updated Mar 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Wildfire & Forest Resilience Task Force (2025). American Indian or Alaska Native Race Alone and Multi-Race Population Concentration - Southern CA [Dataset]. https://wifire-data.sdsc.edu/dataset/clm-american-indian-or-alaska-native-race-alone-and-multi-race-population-concentration-southern-ca
    Explore at:
    wms, wcs, geotiffAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    California Wildfire & Forest Resilience Task Force
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Alaska, United States, Southern California, California
    Description

    Relative concentration of the Southern California region's American Indian population. The variable AIAN_ALN_AND_MULTIRACEAIANALN includes BOTH individuals who select American Indian or Alaska Native as their sole racial identity (they only identify as American Indian), AND individuals who select American Indian / Alaska Native as one of two or more racial identities (they partly identify as American Indian) in response to the Census questionnaire. IMPORTANT: this self reported ancestry and Tribal membership are distinct identities and one does not automatically imply the other. These data should not be interpreted as a distribution of "Tribal people." Numerous Rancherias in the Southern California region account for the wide distribution of very to extremely high concentrations of American Indians.

    "Relative concentration" is a measure that compares the proportion of population within each Census block group data unit that identify as American Indian / Alaska Native alone to the proportion of all people that live within the 13,312 block groups in the Southern California RRK region that identify as American Indian / Alaska native alone. Example: if 5.2% of people in a block group identify as AIANALN, the block group has twice the proportion of AIANALN individuals compared to the Southern California RRK region (2.6%), and more than three times the proportion compared to the entire state of California (1.6%). If the local proportion is twice the regional proportion, then AIANALN individuals are highly concentrated locally.

  19. d

    Annual biomass data (2001-2023) for southern California: above- and...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlie C. Schrader-Patton; Emma C. Underwood; Quinn M. Sorenson (2022). Annual biomass data (2001-2023) for southern California: above- and below-ground, standing dead, and litter [Dataset]. http://doi.org/10.5061/dryad.qz612jmjt
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Dryad
    Authors
    Charlie C. Schrader-Patton; Emma C. Underwood; Quinn M. Sorenson
    Time period covered
    2022
    Area covered
    Southern California, California
    Description

    Annual biomass data (2001-2021) for southern California: above- and below-ground, standing dead, and litter

    https://doi.org/10.5061/dryad.qz612jmjt

    Description of the data and file structure

    Data description:

    Annual spatial estimates of above ground live, standing dead, litter, and below ground biomass (g/m2) for 2001-2023 for southern California.

    These raster layers were created by modeling field plot biomass to covariates, including precipitation, remotely sensed NDVI, and geophysical (slope, aspect, elevation) data.

    For a more complete description, visit https://doi.org/10.5061/dryad.qz612jmjt

    The biomass raster layers are packaged in zip files for each year using the following naming structure:

    WWETAC_UCD_SoCal_Biomass_XXXX.zip

    Where XXXX is the year of the biomass estimates. Within each zip file are the following files:

    WWETAC_UCD_

    Where

  20. W

    Multi-Race, Except Part-American Indian Pop. Concentration - Southern CA

    • wifire-data.sdsc.edu
    geotiff, wcs, wms
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Wildfire & Forest Resilience Task Force (2025). Multi-Race, Except Part-American Indian Pop. Concentration - Southern CA [Dataset]. https://wifire-data.sdsc.edu/dataset/clm-multi-race-except-part-american-indian-pop-concentration-southern-ca
    Explore at:
    geotiff, wms, wcsAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    California Wildfire & Forest Resilience Task Force
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States, Southern California, California
    Description

    The Relative concentration of the Southern California region's population that identifies as "Multiracial", EXCEPT those with part-American Indian identity, in response to the Census questionnaire. "Relative concentration" is a measure that compares the proportion of population within each Census block group data unit that identifies as Multiiracial to the proportion of all people that live within the 13,312 census block groups in the Southern California RRK region. People with part-American Indian identity are not included here but are included in the American Indian or Alaska Native Race Alone and Multirace Population, described above.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘California Housing Prices Data (5 new features!)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-california-housing-prices-data-5-new-features-230f/d4c4de7c/?iid=000-393&v=presentation

‘California Housing Prices Data (5 new features!)’ analyzed by Analyst-2

Explore at:
Dataset updated
Jul 28, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
California
Description

Analysis of ‘California Housing Prices Data (5 new features!)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Similar Datasets:

Boston House Prices: LINK

Context

This is the dataset is a modified version of the California Housing Data used in the paper Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being too toyish and too cumbersome.

The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning.

Modifications with respect to the original data

This dataset includes 5 extra features defined by me: "Distance to coast", "Distance to Los Angeles", "Distance to San Diego", "Distance to San Jose", and "Distance to San Francisco". These extra features try to account for the distance to the nearest coast and the distance to the centre of the largest cities in California.

The distances were calculated using the Haversine formula with the Longitude and Latitude:

https://wikimedia.org/api/rest_v1/media/math/render/svg/a65dbbde43ff45bacd2505fcf32b44fc7dcd8cc0" alt="">

where:

  • phi_1 and phi_2 are the Latitudes of point 1 and point 2, respectively
  • lambda_1 and lambda_2 are the Longitudes of point 1 and point 2, respectively
  • r is the radius of the Earth (6371km)

Content

The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. The columns are as follows, their names are pretty self-explanatory:

1) Median House Value: Median house value for households within a block (measured in US Dollars) [$] 2) Median Income: Median income for households within a block of houses (measured in tens of thousands of US Dollars) [10k$] 3) Median Age: Median age of a house within a block; a lower number is a newer building [years] 4) Total Rooms: Total number of rooms within a block 5) Total Bedrooms: Total number of bedrooms within a block 6) Population: Total number of people residing within a block 7) Households: Total number of households, a group of people residing within a home unit, for a block 8) Latitude: A measure of how far north a house is; a higher value is farther north [°] 9) Longitude: A measure of how far west a house is; a higher value is farther west [°] 10) Distance to coast: Distance to the nearest coast point [m] 11) Distance to Los Angeles: Distance to the centre of Los Angeles [m] 12) Distance to San Diego: Distance to the centre of San Diego [m] 13) Distance to San Jose: Distance to the centre of San Jose [m] 14) Distance to San Francisco: Distance to the centre of San Francisco [m]

Source

This data was entirely modified and cleaned by me. The original data (without the distance features) was initially featured in the following paper: Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.

The original dataset can be found under the following link: https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

--- Original source retains full ownership of the source dataset ---

Search
Clear search
Close search
Google apps
Main menu