5 datasets found
  1. C

    California Urban Area Delineations

    • data.ca.gov
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Finance (2025). California Urban Area Delineations [Dataset]. https://data.ca.gov/dataset/california-urban-area-delineations
    Explore at:
    arcgis geoservices rest api, htmlAvailable download formats
    Dataset updated
    Dec 2, 2025
    Dataset provided by
    Calif. Dept. of Finance Demographic Research Unit
    Authors
    California Department of Finance
    Area covered
    California
    Description

    The Census Bureau released revised delineations for urban areas on December 29, 2022. The new criteria (contained in this Federal Register Notice) is based primarily on housing unit density measured at the census block level. The minimum qualifying threshold for inclusion as an urban area is an area that contains at least 2,000 housing units or has a population of at least 5,000 persons. It also eliminates the classification of areas as “urban clusters/urbanized areas”. This represents a change from 2010, where urban areas were defined as areas consisting of 50,000 people or more and urban clusters consisted of at least 2,500 people but less than 50,000 people with at least 1,500 people living outside of group quarters. Due to the new population thresholds for urban areas, 36 urban clusters in California are no longer considered urban areas, leaving California with 193 urban areas after the new criteria was implemented.

    The State of California experienced an increase of 1,885,884 in the total urban population, or 5.3%. However, the total urban area population as a percentage of the California total population went down from 95% to 94.2%. For more information about the mapped data, download the Excel spreadsheet here.

    Please note that some of the 2020 urban areas have different names or additional place names as a result of the inclusion of housing unit counts as secondary naming criteria.

    Please note there are four urban areas that cross state boundaries in Arizona and Nevada. For 2010, only the parts within California are displayed on the map; however, the population and housing estimates represent the entirety of the urban areas. For 2020, the population and housing unit estimates pertains to the areas within California only.

    Data for this web application was derived from the 2010 and 2020 Censuses (2010 and 2020 Census Blocks, 2020 Urban Areas, and Counties) and the 2016-2020 American Community Survey (2010 -Urban Areas) and can be found at data.census.gov.

    For more information about the urban area delineations, visit the Census Bureau's Urban and Rural webpage and FAQ.

    To view more data from the State of California Department of Finance, visit the Demographic Research Unit Data Hub.

  2. Patients Leaving California Hospitals Against Medical Advice (AMA)

    • data.chhs.ca.gov
    • data.ca.gov
    csv, pdf, zip
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Patients Leaving California Hospitals Against Medical Advice (AMA) [Dataset]. https://data.chhs.ca.gov/dataset/patients-leaving-california-hospitals-against-medical-advice-ama
    Explore at:
    csv(16351), pdf(110422), pdf(74077), csv(17307), zipAvailable download formats
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    Department of Health Care Access and Information
    Area covered
    California
    Description

    These datasets focus on patients leaving California hospitals against medical advice (AMA), which is defined as choosing to leave the hospital before the treating physician recommends discharge. Patients leaving AMA are exposed to higher risks due to inadequately treated medical issues, which may result in the need for readmission.

  3. Gender Pay Gap Dataset

    • kaggle.com
    zip
    Updated Feb 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fedesoriano (2022). Gender Pay Gap Dataset [Dataset]. https://www.kaggle.com/datasets/fedesoriano/gender-pay-gap-dataset
    Explore at:
    zip(61650632 bytes)Available download formats
    Dataset updated
    Feb 2, 2022
    Authors
    fedesoriano
    Description

    Similar Datasets

    • Company Bankruptcy Prediction: LINK
    • The Boston House-Price Data: LINK
    • California Housing Prices Data (5 new features!): LINK
    • Spanish Wine Quality Dataset: LINK

    Context

    The gender pay gap or gender wage gap is the average difference between the remuneration for men and women who are working. Women are generally considered to be paid less than men. There are two distinct numbers regarding the pay gap: non-adjusted versus adjusted pay gap. The latter typically takes into account differences in hours worked, occupations were chosen, education, and job experience. In the United States, for example, the non-adjusted average female's annual salary is 79% of the average male salary, compared to 95% for the adjusted average salary.

    The reasons link to legal, social, and economic factors, and extend beyond "equal pay for equal work".

    The gender pay gap can be a problem from a public policy perspective because it reduces economic output and means that women are more likely to be dependent upon welfare payments, especially in old age.

    This dataset aims to replicate the data used in the famous paper "The Gender Wage Gap: Extent, Trends, and Explanations", which provides new empirical evidence on the extent of and trends in the gender wage gap, which declined considerably during the 1980–2010 period.

    Citation

    fedesoriano. (January 2022). Gender Pay Gap Dataset. Retrieved [Date Retrieved] from https://www.kaggle.com/fedesoriano/gender-pay-gap-dataset.

    Content

    There are 2 files in this dataset: a) the Panel Study of Income Dynamics (PSID) microdata over the 1980-2010 period, and b) the Current Population Survey (CPS) to provide some additional US national data on the gender pay gap.

    PSID variables:

    NOTES: THE VARIABLES WITH fz ADDED TO THEIR NAME REFER TO EXPERIENCE WHERE WE HAVE FILLED IN SOME ZEROS IN THE MISSING PSID YEARS WITH DATA FROM THE RESPONDENTS’ ANSWERS TO QUESTIONS ABOUT JOBS WORKED ON DURING THESE MISSING YEARS. THE fz variables WERE USED IN THE REGRESSION ANALYSES THE VARIABLES WITH A predict PREFIX REFER TO THE COMPUTATION OF ACTUAL EXPERIENCE ACCUMULATED DURING THE YEARS IN WHICH THE PSID DID NOT SURVEY THE RESPONDENTS. THERE ARE MORE PREDICTED EXPERIENCE LEVELS THAT ARE NEEDED TO IMPUTE EXPERIENCE IN THE MISSING YEARS IN SOME CASES. NOTE THAT THE VARIABLES yrsexpf, yrsexpfsz, etc., INCLUDE THESE COMPUTATIONS, SO THAT IF YOU WANT TO USE FULL TIME OR PART TIME EXPERIENCE, YOU DON’T NEED TO ADD THESE PREDICT VARIABLES IN. THEY ARE INCLUDED IN THE DATA SET TO ILLUSTRATE THE RESULTS OF THE COMPUTATION PROCESS. THE VARIABLES WITH AN orig PREFIX ARE THE ORIGINAL PSID VARIABLES. THESE HAVE BEEN PROCESSED AND IN SOME CASES RENAMED FOR CONVENIENCE. THE hd SUFFIX MEANS THAT THE VARIABLE REFERS TO THE HEAD OF THE FAMILY, AND THE wf SUFFIX MEANS THAT IT REFERS TO THE WIFE OR FEMALE COHABITOR IF THERE IS ONE. AS SHOWN IN THE ACCOMPANYING REGRESSION PROGRAM, THESE orig VARIABLES AREN’T USED DIRECTLY IN THE REGRESSIONS. THERE ARE MORE OF THE ORIGINAL PSID VARIABLES, WHICH WERE USED TO CONSTRUCT THE VARIABLES USED IN THE REGRESSIONS. HD MEANS HEAD AND WF MEANS WIFE OR FEMALE COHABITOR.

    1. intnum68: 1968 INTERVIEW NUMBER
    2. pernum68: PERSON NUMBER 68
    3. wave: Current Wave of the PSID
    4. sex: gender SEX OF INDIVIDUAL (1=male, 2=female)
    5. intnum: Wave-specific Interview Number
    6. farminc: Farm Income
    7. region: regLab Region of Current Interview
    8. famwgt: this is the PSID’s family weight, which is used in all analyses
    9. relhead: ER34103L this is the relation to the head of household (10=head; 20=legally married wife; 22=cohabiting partner)
    10. age: Age
    11. employed: ER34116L Whether or not employed or on temp leave (everyone gets a 1 for this variable, since our wage analyses use only the currently employed)
    12. sch: schLbl Highest Year of Schooling
    13. annhrs: Annual Hours Worked
    14. annlabinc: Annual Labor Income
    15. occ: 3 Digit Occupation 2000 codes
    16. ind: 3 Digit Industry 2000 codes
    17. white: White, nonhispanic dummy variable
    18. black: Black, nonhispanic dummy variable
    19. hisp: Hispanic dummy variable
    20. othrace: Other Race dummy variable
    21. degree: degreeLbl Agent's Degree Status (0=no college degree; 1=bachelor’s without advanced degree; 2=advanced degree)
    22. degupd: degreeLbl Agent's Degree Status (Updated with 2009 values)
    23. schupd: schLbl Schooling (updated years of schooling)
    24. annwks: Annual Weeks Worked
    25. unjob: unJobLbl Union Coverage dummy variable
    26. usualhrwk: Usual Hrs Worked Per Week
    27. labincbus: Labor Income from...
  4. Long-Term Occupational Employment Projections

    • kaggle.com
    zip
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amrutha Satishkumar (2025). Long-Term Occupational Employment Projections [Dataset]. https://www.kaggle.com/amruthasatishkumar/long-term-occupational-employment-projections
    Explore at:
    zip(614448 bytes)Available download formats
    Dataset updated
    Feb 3, 2025
    Authors
    Amrutha Satishkumar
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset provides long-term occupational employment projections for the state of California across various industries. It offers insights into job growth, industry trends, and workforce demand over a 10-year horizon.

    Why is this dataset useful? 1. Job Market Analysis – Identify which jobs and industries are expected to grow or decline. Workforce Planning – Helps businesses, policymakers, and educators align training programs with future job demand. 2. Predictive Modeling – Use this dataset for time-series forecasting, job demand predictions, and labor market analytics.

    Data Details: - Timeframe: 2022-2032 - Geography: State of California - Industries Covered: Technology, Healthcare, Retail, Manufacturing, Finance, and more.

    Columns: 1. Area Type – Specifies the geographic classification (e.g., state-level or regional). 2. Area Name – The name of the geographic region (e.g., California, specific labor market regions). 3. Period – The timeframe of the projection, typically from the base year to the projected year (e.g., 2022-2032). 4. SOC Level – The level of the Standard Occupational Classification (SOC) system used for job categorization. 5. Standard Occupational Classification (SOC) – A unique code representing a specific occupation based on the SOC system. 6. Occupational Title – The official job title corresponding to the SOC code. 7. Base Year Employment Estimate – The estimated number of jobs for the occupation in the base year (e.g., 2022). 8. Projected Year Employment Estimate – The expected number of jobs for the occupation in the projected year (e.g., 2032). 9. Numeric Change – The absolute difference in employment between the base year and projected year. 10. Percentage Change – The percentage increase or decrease in employment over the projection period. 11. Exits – Estimated number of workers leaving the occupation due to retirement or career changes. 12. Transfers – Estimated number of workers transferring into or out of an occupation. 13. Total Job Openings – The sum of exits, transfers, and new job creation, representing the total expected openings. 14. Median Hourly Wage – The median hourly wage for the occupation. 15. Median Annual Wage – The median annual wage for the occupation. 16. Entry Level Education – The typical minimum education required for the occupation (e.g., high school diploma, bachelor's degree). 17. Work Experience – The amount of prior work experience typically needed for the occupation. 18. Job Training – The type of on-the-job training required for entry into the occupation.

    Potential Use Cases: ✔ Career Guidance – Helps individuals choose high-growth career paths. ✔ Economic Research – Understand how employment trends impact the economy. ✔ Machine Learning Models – Build predictive models for workforce demand.

    If you find this dataset useful, please upvote! Your support encourages more high-quality datasets.

  5. Estimated Subsidence in the San Joaquin Valley between 1949 – 2005

    • data.ca.gov
    • data.cnra.ca.gov
    pdf
    Updated May 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Water Resources (2019). Estimated Subsidence in the San Joaquin Valley between 1949 – 2005 [Dataset]. https://data.ca.gov/dataset/estimated-subsidence-in-the-san-joaquin-valley-between-1949-2005
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 1, 2019
    Dataset authored and provided by
    California Department of Water Resourceshttp://www.water.ca.gov/
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    San Joaquin Valley
    Description

    San Joaquin Valley Subsidence Analysis README.
    Written: Joel Dudas, 3/12/2017. Amended: Ben Brezing, 4/2/2019. DWR’s Division of Engineering Geodetic Branch received a request in 1/2017 from Jeanine Jones to produce a graphic of historic subsidence in the entirety of the San Joaquin Valley. The task was assigned to the Mapping & Photogrammetry Office and the Geospatial Data Support Section to complete by early February. After reviewing the alternatives, the decision was made to produce contours from the oldest available set of quad maps for which there was reasonable certainty about quality and datum, and to compare that to the most current Valley-wide DEM. For the first requirement, research indicated that the 1950’s vintage quad maps for the Valley were the best alternative. Prior quad map editions are uneven in quality and vintage, and the actual control used for the contour lines was extremely suspect. The 1950’s quads, by contrast, were produced primarily on the basis of 1948-1949 aerial photography, along with control corresponding to that period, and referenced to the National Geodetic Vertical Datum of 1929. For the current set, the most recent Valley-wide dataset that was freely available, in the public domain, and of reasonable accuracy was the 2005 NextMap SAR acquisition (referenced to NAVD88). The primary bulk of the work focused on digitizing the 1950’s contours. First, all of the necessary quads were downloaded from the online USGS quad source https://ngmdb.usgs.gov/maps/Topoview/viewer/#4/41.13/-107.51. Then the entire staff of the Mapping & Photogrammetry Lab (including both the Mapping Office and GDDS staff) proceeded to digitize the contours. Given the short turnaround time constraint and limited budget, certain shortcuts occurred in contour development. While efforts were made to digitize accurately, speed really was important. Contours were primarily focused only on agricultural and other lowland areas, and so highlands were by and large skipped. The tight details of contours along rivers, levees, and hillsides was skipped and/or simplified. In some cases, only major contours were digitized. The mapping on the source quads itself varied….in a few cases on spot elevations on benchmarks were available in quads. The contour interval sometimes varied, even within the quad sheet itself. In addition, because 8 different people were creating the contours, variability exists in the style and attention to detail. It should be understood that given the purpose of the project (display regional subsidence patterns), that literal and precise development of the historic contour sets leaves some things to be desired. These caveats being said, the linework is reasonably accurate for what it is (particularly given that the contours of that era themselves were mapped at an unknown and varying actual quality). The digitizers tagged the lines with Z values manually entered after linework that corresponded to the mapped elevation contours. Joel Dudas then did what could be called a “rough” QA/QC of the contours. The individual lines were stitched together into a single contour set, and exported to an elevation raster (using TopoToRaster in ArcGIS 10.4). Gross blunders in Z values were corrected. Gaps in the coverage were filled. The elevation grid was then adjusted to NAVD88 using a single adjustment for the entire coverage area (2.5’, which is a pretty close average of values in this region). The NextMap data was extracted for the area, and converted into feet. The two raster sets were fixed to the same origin point. The subsidence grid was then created by subtracting the old contour-derived grid from the NextMAP DEM. The subsidence grid that includes all of the values has the suffix “ALL”. Then, to improve the display fidelity, some of the extreme values (above +5’ and below -20’*) were filtered out of the dataset, and the subsidence grid was regenerated for these areas and suffixed with “cut.” The purpose of this cut was to extract some of the riverine and hilly areas that produced more extreme values and other artifacts purely due to the analysis approach (i.e. not actual real elevation change). * - some of the areas with more than 20 feet of subsidence were omitted from this clipping, because they were in heavily subsided areas and may be “real subsidence.”The resulting subsidence product should be perceived in light of the above. Some of the collar of the San Joaquin Valley shows large changes, but that is simply due to the analysis method. Also, individual grid cells may or may not be comparing the same real features. Errors are baked into both comparison datasets. However, it is important to note that the large areas of subsidence in the primary agriculture area agree fairly well with a cruder USGS subsidence map of the Valley based on extensometer data. We have confidence that the big picture story these results show us is largely correct, and that the magnitudes of subsidence are somewhat reasonable. The contour set can serve as the baseline to support future comparisons using more recent or future data as it becomes available. It should be noted there are two key versions of the data. The “Final Deliverables” from 2/2017 were delivered to support the initial Public Affairs press release. Subsequent improvements were made in coverage and blunder correction as time permitted (it should be noted this occurred in the midst of the Oroville Dam emergency) to produce the final as of 3/12/2017. Further improvements in overall quality and filtering could occur in the future if time and needs demand it.
    Update (4/3/2019, Ben Brezing): The raster was further smoothed to remove artifacts that result from comparing the high resolution NextMAP DEM to the lower resolution DEM that was derived from the 1950’s quad map contours. The smoothing was accomplished by removing raster cells with values that are more than 0.5 feet different than adjacent cells (25 meter cell size), as well as the adjacent cells. The resulting raster was then resampled to a raster with 100 meter cell size using cubic resampling technique and was then converted to a point feature class. The point feature class was then interpolated to a raster with 250 meter cell size using the IDW technique, a fixed search radius of 1250 meters and power=2. The resulting raster was clipped to a smaller extent to remove noisier areas around the edges of the Central Valley while retaining coverage for the main area of interest.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
California Department of Finance (2025). California Urban Area Delineations [Dataset]. https://data.ca.gov/dataset/california-urban-area-delineations

California Urban Area Delineations

Explore at:
arcgis geoservices rest api, htmlAvailable download formats
Dataset updated
Dec 2, 2025
Dataset provided by
Calif. Dept. of Finance Demographic Research Unit
Authors
California Department of Finance
Area covered
California
Description

The Census Bureau released revised delineations for urban areas on December 29, 2022. The new criteria (contained in this Federal Register Notice) is based primarily on housing unit density measured at the census block level. The minimum qualifying threshold for inclusion as an urban area is an area that contains at least 2,000 housing units or has a population of at least 5,000 persons. It also eliminates the classification of areas as “urban clusters/urbanized areas”. This represents a change from 2010, where urban areas were defined as areas consisting of 50,000 people or more and urban clusters consisted of at least 2,500 people but less than 50,000 people with at least 1,500 people living outside of group quarters. Due to the new population thresholds for urban areas, 36 urban clusters in California are no longer considered urban areas, leaving California with 193 urban areas after the new criteria was implemented.

The State of California experienced an increase of 1,885,884 in the total urban population, or 5.3%. However, the total urban area population as a percentage of the California total population went down from 95% to 94.2%. For more information about the mapped data, download the Excel spreadsheet here.

Please note that some of the 2020 urban areas have different names or additional place names as a result of the inclusion of housing unit counts as secondary naming criteria.

Please note there are four urban areas that cross state boundaries in Arizona and Nevada. For 2010, only the parts within California are displayed on the map; however, the population and housing estimates represent the entirety of the urban areas. For 2020, the population and housing unit estimates pertains to the areas within California only.

Data for this web application was derived from the 2010 and 2020 Censuses (2010 and 2020 Census Blocks, 2020 Urban Areas, and Counties) and the 2016-2020 American Community Survey (2010 -Urban Areas) and can be found at data.census.gov.

For more information about the urban area delineations, visit the Census Bureau's Urban and Rural webpage and FAQ.

To view more data from the State of California Department of Finance, visit the Demographic Research Unit Data Hub.

Search
Clear search
Close search
Google apps
Main menu