21 datasets found
  1. Medical Service Study Areas

    • data.chhs.ca.gov
    • healthdata.gov
    • +5more
    Updated Dec 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2024). Medical Service Study Areas [Dataset]. https://data.chhs.ca.gov/dataset/medical-service-study-areas
    Explore at:
    csv, html, geojson, kml, zip, arcgis geoservices rest apiAvailable download formats
    Dataset updated
    Dec 6, 2024
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description
    This is the current Medical Service Study Area. California Medical Service Study Areas are created by the California Department of Health Care Access and Information (HCAI).

    Check the Data Dictionary for field descriptions.


    Checkout the California Healthcare Atlas for more Medical Service Study Area information.

    This is an update to the MSSA geometries and demographics to reflect the new 2020 Census tract data. The Medical Service Study Area (MSSA) polygon layer represents the best fit mapping of all new 2020 California census tract boundaries to the original 2010 census tract boundaries used in the construction of the original 2010 MSSA file. Each of the state's new 9,129 census tracts was assigned to one of the previously established medical service study areas (excluding tracts with no land area), as identified in this data layer. The MSSA Census tract data is aggregated by HCAI, to create this MSSA data layer. This represents the final re-mapping of 2020 Census tracts to the original 2010 MSSA geometries. The 2010 MSSA were based on U.S. Census 2010 data and public meetings held throughout California.


    <a href="https://hcai.ca.gov/">https://hcai.ca.gov/</a>

    Source of update: American Community Survey 5-year 2006-2010 data for poverty. For source tables refer to InfoUSA update procedural documentation. The 2010 MSSA Detail layer was developed to update fields affected by population change. The American Community Survey 5-year 2006-2010 population data pertaining to total, in households, race, ethnicity, age, and poverty was used in the update. The 2010 MSSA Census Tract Detail map layer was developed to support geographic information systems (GIS) applications, representing 2010 census tract geography that is the foundation of 2010 medical service study area (MSSA) boundaries. ***This version is the finalized MSSA reconfiguration boundaries based on the US Census Bureau 2010 Census. In 1976 Garamendi Rural Health Services Act, required the development of a geographic framework for determining which parts of the state were rural and which were urban, and for determining which parts of counties and cities had adequate health care resources and which were "medically underserved". Thus, sub-city and sub-county geographic units called "medical service study areas [MSSAs]" were developed, using combinations of census-defined geographic units, established following General Rules promulgated by a statutory commission. After each subsequent census the MSSAs were revised. In the scheduled revisions that followed the 1990 census, community meetings of stakeholders (including county officials, and representatives of hospitals and community health centers) were held in larger metropolitan areas. The meetings were designed to develop consensus as how to draw the sub-city units so as to best display health care disparities. The importance of involving stakeholders was heightened in 1992 when the United States Department of Health and Human Services' Health and Resources Administration entered a formal agreement to recognize the state-determined MSSAs as "rational service areas" for federal recognition of "health professional shortage areas" and "medically underserved areas". After the 2000 census, two innovations transformed the process, and set the stage for GIS to emerge as a major factor in health care resource planning in California. First, the Office of Statewide Health Planning and Development [OSHPD], which organizes the community stakeholder meetings and provides the staff to administer the MSSAs, entered into an Enterprise GIS contract. Second, OSHPD authorized at least one community meeting to be held in each of the 58 counties, a significant number of which were wholly rural or frontier counties. For populous Los Angeles County, 11 community meetings were held. As a result, health resource data in California are collected and organized by 541 geographic units. The boundaries of these units were established by community healthcare experts, with the objective of maximizing their usefulness for needs assessment purposes. The most dramatic consequence was introducing a data simultaneously displayed in a GIS format. A two-person team, incorporating healthcare policy and GIS expertise, conducted the series of meetings, and supervised the development of the 2000-census configuration of the MSSAs.

    MSSA Configuration Guidelines (General Rules):- Each MSSA is composed of one or more complete census tracts.- As a general rule, MSSAs are deemed to be "rational service areas [RSAs]" for purposes of designating health professional shortage areas [HPSAs], medically underserved areas [MUAs] or medically underserved populations [MUPs].- MSSAs will not cross county lines.- To the extent practicable, all census-defined places within the MSSA are within 30 minutes travel time to the largest population center within the MSSA, except in those circumstances where meeting this criterion would require splitting a census tract.- To the extent practicable, areas that, standing alone, would meet both the definition of an MSSA and a Rural MSSA, should not be a part of an Urban MSSA.- Any Urban MSSA whose population exceeds 200,000 shall be divided into two or more Urban MSSA Subdivisions.- Urban MSSA Subdivisions should be within a population range of 75,000 to 125,000, but may not be smaller than five square miles in area. If removing any census tract on the perimeter of the Urban MSSA Subdivision would cause the area to fall below five square miles in area, then the population of the Urban MSSA may exceed 125,000. - To the extent practicable, Urban MSSA Subdivisions should reflect recognized community and neighborhood boundaries and take into account such demographic information as income level and ethnicity. Rural Definitions: A rural MSSA is an MSSA adopted by the Commission, which has a population density of less than 250 persons per square mile, and which has no census defined place within the area with a population in excess of 50,000. Only the population that is located within the MSSA is counted in determining the population of the census defined place. A frontier MSSA is a rural MSSA adopted by the Commission which has a population density of less than 11 persons per square mile. Any MSSA which is not a rural or frontier MSSA is an urban MSSA. Last updated December 6th 2024.
  2. N

    Medical Lake, WA Population Pyramid Dataset: Age Groups, Male and Female...

    • neilsberg.com
    csv, json
    Updated Sep 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). Medical Lake, WA Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis [Dataset]. https://www.neilsberg.com/research/datasets/62e47581-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Sep 16, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Medical Lake, Washington
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the data for the Medical Lake, WA population pyramid, which represents the Medical Lake population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey 5-Year estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

    Key observations

    • Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for Medical Lake, WA, is 21.5.
    • Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for Medical Lake, WA, is 12.5.
    • Total dependency ratio for Medical Lake, WA is 34.0.
    • Potential support ratio, which is the number of youth (working age population) per elderly, for Medical Lake, WA is 8.0.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group for the Medical Lake population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the Medical Lake for the selected age group is shown in the following column.
    • Population (Female): The female population in the Medical Lake for the selected age group is shown in the following column.
    • Total Population: The total population of the Medical Lake for the selected age group is shown in the following column.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Medical Lake Population by Age. You can refer the same here

  3. a

    Medical Service Study Area Demographics

    • usc-geohealth-hub-uscssi.hub.arcgis.com
    Updated Nov 10, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spatial Sciences Institute (2021). Medical Service Study Area Demographics [Dataset]. https://usc-geohealth-hub-uscssi.hub.arcgis.com/datasets/medical-service-study-area-demographics
    Explore at:
    Dataset updated
    Nov 10, 2021
    Dataset authored and provided by
    Spatial Sciences Institute
    Area covered
    Description

    Medical Service Study Areas (MSSAs)As defined by California's Office of Statewide Health Planning and Development (OSHPD) in 2013, "MSSAs are sub-city and sub-county geographical units used to organize and display population, demographic and physician data" (Source). Each census tract in CA is assigned to a given MSSA. The most recent MSSA dataset (2014) was used. Spatial data are available via OSHPD at the California Open Data Portal. This information may be useful in studying health equity.Definitions:Race/Ethnicity: Race/ethnicity is categorized as: All races/ethnicities, Non-Hispanic (NH) White, NH Black, Asian/Pacific Islander, or Hispanic. "All races" includes all of the above, as well as other and unknown race/ethnicity and American Indian/Alaska Native. The latter two groups are not reported separately due to small numbers for many cancer sites.Racial/Ethnic Composition: Distribution of residents' race/ethnicity (e.g., % Hispanic, % non-Hispanic White, % non-Hispanic Black, % non-Hispanic Asian/Pacific Islander). (Source: US Census, 2010.)Rural: Percent of residents who reside in blocks that are designated as rural. (Source: US Census, 2010.)Foreign Born: Percent of residents who were born outside the United States. (Source: American Community Survey, 2008-2012.)Socioeconomic Status (Neighborhood Level): A composite measure of seven indicator variables created by principal component analysis; indicators include: education, blue-collar job, unemployment, household income, poverty, rent, and house value. Quintiles based on state distribution, with quintile 1 being the lowest SES and 5 being the highest. (Source: American Community Survey, 2008-2012.)Spatial extent: CaliforniaSpatial Unit: MSSACreated: n/aUpdated: n/aSource: California Health MapsContact Email: gbacr@ucsf.eduSource Link: https://www.californiahealthmaps.org/?areatype=mssa&address=&sex=Both&site=AllSite&race=&year=05yr&overlays=none&choropleth=Obesity

  4. f

    Patient demographics.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Nov 5, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fitzgerald-Hughes, Deirdre; O’Keeffe, Kate M.; Hearnden, Claire H.; Leech, John M.; Brown, Aisling F.; Lalor, Stephen J.; McLoughlin, Rachel M.; Rogers, Thomas R.; Mac Aogáin, Micheál; Lacey, Keenan A.; Murphy, Alison G.; Foster, Timothy J.; Tavakol, Mehri; O’Halloran, Dara P.; Geoghegan, Joan A.; Lavelle, Ed C.; Fennell, Jérôme P.; Humphreys, Hilary; van Wamel, Willem J. (2015). Patient demographics. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001916216
    Explore at:
    Dataset updated
    Nov 5, 2015
    Authors
    Fitzgerald-Hughes, Deirdre; O’Keeffe, Kate M.; Hearnden, Claire H.; Leech, John M.; Brown, Aisling F.; Lalor, Stephen J.; McLoughlin, Rachel M.; Rogers, Thomas R.; Mac Aogáin, Micheál; Lacey, Keenan A.; Murphy, Alison G.; Foster, Timothy J.; Tavakol, Mehri; O’Halloran, Dara P.; Geoghegan, Joan A.; Lavelle, Ed C.; Fennell, Jérôme P.; Humphreys, Hilary; van Wamel, Willem J.
    Description

    a Healthcare-associated infections were defined as (i) index positive blood culture collected ≥48hrs after hospital admission, and no signs or symptoms of the infection noted at time of admission; OR (ii) index positive blood culture collected <48hrs after hospital admission if any of the following criteria are met: received intravenous therapy in an ambulatory setting in the 30 days before onset of BSI, attended a hospital clinic or haemodialysis in the 30 days before onset of BSI, hospitalised in an acute care hospital for ≥ 2 days in the 90 days prior to onset of BSI, resident of nursing home or long-term care facility.bStaphylococcus aureus bacteraemia was defined as uncomplicated if all of the following criteria were met: exclusion of endocarditis; no evidence of metastatic infection; absence of implanted prostheses; follow-up blood cultures at 2–4 days culture-negative for S. aureus; defervescence within 72 h of initiating effective therapy. Percentages shown are of entire S. aureus BSI population.† Three patients had chronic diabetic foot ulcers as a source of their S. aureus BSI, and in all cases the contiguous underlying bone was also found to be infected.MRSA = methicillin-resistant Staphylococcus aureus. NA = not applicable. BSI = bloodstream infection.Data are displayed as median (interquartile range) and number (percentage). P values are calculated by Mann-Whitney and Fisher’s exact test respectively.

  5. Medical Service Study Area Data Dictionary

    • data.chhs.ca.gov
    • data.ca.gov
    • +4more
    Updated Sep 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2024). Medical Service Study Area Data Dictionary [Dataset]. https://data.chhs.ca.gov/dataset/medical-service-study-area-data-dictionary
    Explore at:
    kml, zip, html, arcgis geoservices rest api, geojson, csvAvailable download formats
    Dataset updated
    Sep 5, 2024
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description
    Field NameData TypeDescription
    StatefpNumberUS Census Bureau unique identifier of the state
    CountyfpNumberUS Census Bureau unique identifier of the county
    CountynmTextCounty name
    TractceNumberUS Census Bureau unique identifier of the census tract
    GeoidNumberUS Census Bureau unique identifier of the state + county + census tract
    AlandNumberUS Census Bureau defined land area of the census tract
    AwaterNumberUS Census Bureau defined water area of the census tract
    AsqmiNumberArea calculated in square miles from the Aland
    MSSAidTextID of the Medical Service Study Area (MSSA) the census tract belongs to
    MSSAnmTextName of the Medical Service Study Area (MSSA) the census tract belongs to
    DefinitionTextType of MSSA, possible values are urban, rural and frontier.
    TotalPovPopNumberUS Census Bureau total population for whom poverty status is determined of the census tract, taken from the 2020 ACS 5 YR S1701
  6. Mexico-WHO Health Indicators

    • kaggle.com
    zip
    Updated Jan 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Mexico-WHO Health Indicators [Dataset]. https://www.kaggle.com/datasets/thedevastator/mexico-who-health-indicators
    Explore at:
    zip(818791 bytes)Available download formats
    Dataset updated
    Jan 22, 2023
    Authors
    The Devastator
    Area covered
    Mexico
    Description

    Mexico-WHO Health Indicators

    Demographic, Disease, and Treatment Coverage Data

    By Humanitarian Data Exchange [source]

    About this dataset

    This Kaggle dataset contains a wide array of health and socioeconomic indicators relating to Mexico. It covers topics ranging from mortality and global health estimates, to Sustainable Development Goals, Millennium Development Goals (MDGs), Health Systems, Malaria and Tuberculosis, Child Health, Infectious Diseases, World Health Statistics, Health Financing and Public Heath & Environment. Furthermore, it includes indicators for Substance Use & Mental Health; Tobacco use; Injuries & Violence; HIV/AIDS & Other STIs; Nutrition; Urban Health; Noncommunicable Diseases (NCDs); Neglected Tropical Diseases (NTDs); Infrastructure; Essential Technologies in healthcare systems; Demographic & Socioeconomic Statistics. Finally it features indicators surrounding International Regulations Monitoring Frameworks as well as Insecticides Resistance amongst other topics.

    This dataset is bursting with information on how Mexico stands in a variety of different aspects across its development spectrum- enabling researchers to gain deeper insight into the country's ecosystem as well as providing them with the data required to pinpoint potential ‘hotspots’- Areas which may require heightened attention either from policy makers or individuals looking for smarter ways through which their efforts might benefit their target population most efficiently. Don’t miss your chance at unlocking the power of this comprehensive dataset so you can make sure that no stone is left unturned when it comes to realising tangible outcomes from your research!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    The dataset is organized into several key categories and each category contains a number of different indicators related to that particular area of healthcare. In order to better understand any given indicator in more detail, each one also has an associated metadata page with additional information about its definition and calculation method.

    In order to make use of the data in this dataset there are several steps you will need to take: - Decide what aspect or area of healthcare you would like to explore further in more detail; - Review/understand any associated metadata provided regarding its definition or calculation method;
    - Download any necessary files containing relevant numbers or figures;
    - Analyze or explore this data further;
    6 Use your findings to inform decisions about policy interventions for improving general public health outcomes in Mexico!

    Research Ideas

    • Analyzing Mexico's progress towards achieving the desired health indicators for the Sustainable Development Goals (SDGs).
    • Examining how access to healthcare and mental health services vary by region, as well as disparities in treatment within regions.
    • Developing machine learning models to predict outcome based on different factors such as environment and socioeconomic status

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: infrastructure-indicators-for-mexico-11.csv | Column name | Description | |:---------------------------|:---------------------------------------------------------------| | GHO (CODE) | The Global Health Observatory code for the indicator. (String) | | GHO (DISPLAY) | The name of the indicator. (String) | | GHO (URL) | The URL for the indicator. (URL) | | PUBLISHSTATE (CODE) | The code for the publication state of the indicator. (String) | | PUBLISHSTATE (DISPLAY) | The name of the publication state of the indicator. (String) | | PUBLISHSTATE (URL) | The URL for the publication state of the indicator. (URL) | | YEAR (CODE) | The code for the year of the indicator. (String) | | YEAR (DISPLAY) | The name of the year of the indicator. (String) | | YEAR (URL) | The URL for the year of the indicator. (URL) | | REGION (CODE) | The code for the region of the indicator. (String) | | REGION (DISPLAY) | The name of the region of the indicator. (String) | | REGION (URL) |...

  7. D

    [Archived] COVID-19 Deaths by Population Characteristics Over Time

    • data.sfgov.org
    • healthdata.gov
    • +1more
    csv, xlsx, xml
    Updated Jun 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). [Archived] COVID-19 Deaths by Population Characteristics Over Time [Dataset]. https://data.sfgov.org/Health-and-Social-Services/-Archived-COVID-19-Deaths-by-Population-Characteri/kkr3-wq7h
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Jun 27, 2024
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    As of July 2nd, 2024 the COVID-19 Deaths by Population Characteristics Over Time dataset has been retired. This dataset is archived and will no longer update. We will be publishing a cumulative deaths by population characteristics dataset that will update moving forward.

    A. SUMMARY This dataset shows San Francisco COVID-19 deaths by population characteristics and by date. This data may not be immediately available for recently reported deaths. Data updates as more information becomes available. Because of this, death totals for previous days may increase or decrease. More recent data is less reliable.

    Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how deaths have been distributed among different subgroups. This information can reveal trends and disparities among groups.

    B. HOW THE DATASET IS CREATED As of January 1, 2023, COVID-19 deaths are defined as persons who had COVID-19 listed as a cause of death or a significant condition contributing to their death on their death certificate. This definition is in alignment with the California Department of Public Health and the national https://preparedness.cste.org/wp-content/uploads/2022/12/CSTE-Revised-Classification-of-COVID-19-associated-Deaths.Final_.11.22.22.pdf">Council of State and Territorial Epidemiologists. Death certificates are maintained by the California Department of Public Health.

    Data on the population characteristics of COVID-19 deaths are from: *Case reports *Medical records *Electronic lab reports *Death certificates

    Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths.

    To protect resident privacy, we summarize COVID-19 data by only one characteristic at a time. Data are not shown until cumulative citywide deaths reach five or more.

    Data notes on each population characteristic type is listed below.

    Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases.

    Gender * The City collects information on gender identity using these guidelines.

    C. UPDATE PROCESS Updates automatically at 06:30 and 07:30 AM Pacific Time on Wednesday each week.

    Dataset will not update on the business day following any federal holiday.

    D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).

    This dataset includes many different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of deaths on each date.

    New deaths are the count of deaths within that characteristic group on that specific date. Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed.

    This data may not be immediately available for more recent deaths. Data updates as more information becomes available.

    To explore data on the total number of deaths, use the COVID-19 Deaths Over Time dataset.

    E. CHANGE LOG

    • 9/11/2023 - on this date, we began using an updated definition of a COVID-19 death to align with the California Department of Public Health. This change was applied to COVID-19 deaths retrospectively beginning on 1/1/2023. More information about the recommendation by the Council of State and Territorial Epidemiologists that motivated this change can be found https://preparedness.cste.org/wp-content/uploads/2022/12/CSTE-Revised-Classification-of-COVID-19-associated-Deaths.Final_.11.22.22.pdf">here.
    • 6/6/2023 - data on deaths by transmission type have been removed. See section ARCHIVED DATA for more detail.
    • 5/16/2023 - data on deaths by sexual orientation, comorbidities, homelessness, and single room occupancy have been removed. See section ARCHIVED DATA for more detail.
    • 4/6/2023 - the State implemented system updates to improve the integrity of historical data.
    • 1/31/2023 - column “population_estimate” added.
    • 3/23/2022 - ‘Native American’ changed to ‘American Indian or Alaska Native’ to align with the census.
    • 1/22/2022 - system updates to improve timeliness and accuracy of cases and deaths data were implemented.

  8. p

    MIMIC-III Clinical Database

    • physionet.org
    • oppositeofnorth.com
    Updated Sep 4, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Tom Pollard; Roger Mark (2016). MIMIC-III Clinical Database [Dataset]. http://doi.org/10.13026/C2XW26
    Explore at:
    Dataset updated
    Sep 4, 2016
    Authors
    Alistair Johnson; Tom Pollard; Roger Mark
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (including post-hospital discharge).MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors: it is freely available to researchers worldwide; it encompasses a diverse and very large population of ICU patients; and it contains highly granular data, including vital signs, laboratory results, and medications.

  9. d

    COVID-19 Deaths by Population Characteristics

    • catalog.data.gov
    • data.sfgov.org
    • +2more
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.sfgov.org (2025). COVID-19 Deaths by Population Characteristics [Dataset]. https://catalog.data.gov/dataset/covid-19-deaths-by-population-characteristics
    Explore at:
    Dataset updated
    Oct 25, 2025
    Dataset provided by
    data.sfgov.org
    Description

    A. SUMMARY This dataset shows San Francisco COVID-19 deaths by population characteristics. This data may not be immediately available for recently reported deaths. Data updates as more information becomes available. Because of this, death totals may increase or decrease. Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how deaths have been distributed among different subgroups. This information can reveal trends and disparities among groups. B. HOW THE DATASET IS CREATED As of January 1, 2023, COVID-19 deaths are defined as persons who had COVID-19 listed as a cause of death or a significant condition contributing to their death on their death certificate. This definition is in alignment with the California Department of Public Health and the national Council of State and Territorial Epidemiologists. Death certificates are maintained by the California Department of Public Health. Data on the population characteristics of COVID-19 deaths are from: Case reports Medical records Electronic lab reports Death certificates Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths. To protect resident privacy, we summarize COVID-19 data by only one population characteristic at a time. Data are not shown until cumulative citywide deaths reach five or more. Data notes on select population characteristic types are listed below. Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases. Gender * The City collects information on gender identity using these guidelines. C. UPDATE PROCESS Updates automatically at 06:30 and 07:30 AM Pacific Time on Wednesday each week. Dataset will not update on the business day following any federal holiday. D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a dataset based on the San Francisco Population and Demographic Census dataset.These population estimates are from the 2018-2022 5-year American Community Survey (ACS). This dataset includes several characteristic types. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of cumulative deaths. Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed. To explore data on the total number of deaths, use the COVID-19 Deaths Over Time dataset. E. CHANGE LOG

  10. MIMIC-III - Deep Reinforcement Learning

    • kaggle.com
    zip
    Updated Apr 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asjad K (2022). MIMIC-III - Deep Reinforcement Learning [Dataset]. https://www.kaggle.com/datasets/asjad99/mimiciii
    Explore at:
    zip(11100065 bytes)Available download formats
    Dataset updated
    Apr 7, 2022
    Authors
    Asjad K
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Digitization of healthcare data along with algorithmic breakthroughts in AI will have a major impact on healthcare delivery in coming years. Its intresting to see application of AI to assist clinicians during patient treatment in a privacy preserving way. While scientific knowledge can help guide interventions, there remains a key need to quickly cut through the space of decision policies to find effective strategies to support patients during the care process.

    Offline Reinforcement learning (also referred to as safe or batch reinforcement learning) is a promising sub-field of RL which provides us with a mechanism for solving real world sequential decision making problems where access to simulator is not available. Here we assume that learn a policy from fixed dataset of trajectories with further interaction with the environment(agent doesn't receive reward or punishment signal from the environment). It has shown that such an approach can leverage vast amount of existing logged data (in the form of previous interactions with the environment) and can outperform supervised learning approaches or heuristic based policies for solving real world - decision making problems. Offline RL algorithms when trained on sufficiently large and diverse offline datasets can produce close to optimal policies(ability to generalize beyond training data).

    As Part of my PhD, research, I investigated the problem of developing a Clinical Decision Support System for Sepsis Management using Offline Deep Reinforcement Learning.

    MIMIC-III ('Medical Information Mart for Intensive Care') is a large open-access anonymized single-center database which consists of comprehensive clinical data of 61,532 critical care admissions from 2001–2012 collected at a Boston teaching hospital. Dataset consists of 47 features (including demographics, vitals, and lab test results) on a cohort of sepsis patients who meet the sepsis-3 definition criteria.

    we try to answer the following question:

    Given a particular patient’s characteristics and physiological information at each time step as input, can our DeepRL approach, learn an optimal treatment policy that can prescribe the right intervention(e.g use of ventilator) to the patient each stage of the treatment process, in order to improve the final outcome(e.g patient mortality)?

    we can use popular state-of-the-art algorithms such as Deep Q Learning(DQN), Double Deep Q Learning (DDQN), DDQN combined with BNC, Mixed Monte Carlo(MMC) and Persistent Advantage Learning (PAL). Using these methods we can train an RL policy to recommend optimum treatment path for a given patient.

    Data acquisition, standard pre-processing and modelling details can be found here in Github repo: https://github.com/asjad99/MIMIC_RL_COACH

  11. Socio-demographics of the study population (n = 671).

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Birte Pantenburg; Claudia Sikorski; Melanie Luppa; Georg Schomerus; Hans-Helmut König; Perla Werner; Steffi G. Riedel-Heller (2023). Socio-demographics of the study population (n = 671). [Dataset]. http://doi.org/10.1371/journal.pone.0048113.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Birte Pantenburg; Claudia Sikorski; Melanie Luppa; Georg Schomerus; Hans-Helmut König; Perla Werner; Steffi G. Riedel-Heller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SD = standard deviation; ow = overweight.*definition of migrational background adopted from Federal Statistical Office of Germany: Participant not born in Germany or not in possession of German passport, or at least one of participant’s parents not born in Germany [38].

  12. Assessing the validity of a data driven segmentation approach: A 4 year...

    • plos.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lian Leng Low; Shi Yan; Yu Heng Kwan; Chuen Seng Tan; Julian Thumboo (2023). Assessing the validity of a data driven segmentation approach: A 4 year longitudinal study of healthcare utilization and mortality [Dataset]. http://doi.org/10.1371/journal.pone.0195243
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lian Leng Low; Shi Yan; Yu Heng Kwan; Chuen Seng Tan; Julian Thumboo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundSegmentation of heterogeneous patient populations into parsimonious and relatively homogenous groups with similar healthcare needs can facilitate healthcare resource planning and development of effective integrated healthcare interventions for each segment. We aimed to apply a data-driven, healthcare utilization-based clustering analysis to segment a regional health system patient population and validate its discriminative ability on 4-year longitudinal healthcare utilization and mortality data.MethodsWe extracted data from the Singapore Health Services Electronic Health Intelligence System, an electronic medical record database that included healthcare utilization (inpatient admissions, specialist outpatient clinic visits, emergency department visits, and primary care clinic visits), mortality, diseases, and demographics for all adult Singapore residents who resided in and had a healthcare encounter with our regional health system in 2012. Hierarchical clustering analysis (Ward’s linkage) and K-means cluster analysis using age and healthcare utilization data in 2012 were applied to segment the selected population. These segments were compared using their demographics (other than age) and morbidities in 2012, and longitudinal healthcare utilization and mortality from 2013–2016.ResultsAmong 146,999 subjects, five distinct patient segments “Young, healthy”; “Middle age, healthy”; “Stable, chronic disease”; “Complicated chronic disease” and “Frequent admitters” were identified. Healthcare utilization patterns in 2012, morbidity patterns and demographics differed significantly across all segments. The “Frequent admitters” segment had the smallest number of patients (1.79% of the population) but consumed 69% of inpatient admissions, 77% of specialist outpatient visits, 54% of emergency department visits, and 23% of primary care clinic visits in 2012. 11.5% and 31.2% of this segment has end stage renal failure and malignancy respectively. The validity of cluster-analysis derived segments is supported by discriminative ability for longitudinal healthcare utilization and mortality from 2013–2016. Incident rate ratios for healthcare utilization and Cox hazards ratio for mortality increased as patient segments increased in complexity. Patients in the “Frequent admitters” segment accounted for a disproportionate healthcare utilization and 8.16 times higher mortality rate.ConclusionOur data-driven clustering analysis on a general patient population in Singapore identified five patient segments with distinct longitudinal healthcare utilization patterns and mortality risk to provide an evidence-based segmentation of a regional health system’s healthcare needs.

  13. Healthcare Symptoms–Disease Classification Dataset

    • kaggle.com
    zip
    Updated Nov 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kundan Sagar Bedmutha (2025). Healthcare Symptoms–Disease Classification Dataset [Dataset]. https://www.kaggle.com/datasets/kundanbedmutha/healthcare-symptomsdisease-classification-dataset
    Explore at:
    zip(373982 bytes)Available download formats
    Dataset updated
    Nov 16, 2025
    Authors
    Kundan Sagar Bedmutha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 25,000 synthetic healthcare records designed for machine learning models that classify diseases based on patient symptoms. It includes demographic attributes, symptom lists, and confirmed diagnoses across 30 common acute, chronic, infectious, and neurological diseases.

    The dataset is well-suited for:

    Multi-class disease classification Symptom pattern analysis Medical decision support modeling NLP feature extraction on symptom text Data mining and biomedical research

    Each record corresponds to a unique patient with a generated combination of symptoms and diagnosis created from realistic patterns while maintaining anonymity.

    This dataset is purely synthetic, meaning no real patient data is used.

    📌 Column Descriptions

    Patient_ID — A randomized unique identifier assigned to each synthetic patient. Age — Age of the patient (ranging from 1 to 90 years). Gender — Gender of the patient (Male, Female, or Other). Symptoms — A comma-separated list containing 3 to 7 symptoms. Symptom_Count — Total number of symptoms listed for the patient. Disease — The diagnosed condition; one of the 30 diseases included in the dataset.

    🦠 List of Diseases Included

    Common Cold, Influenza, COVID-19, Pneumonia, Tuberculosis, Diabetes, Hypertension, Asthma, Heart Disease, Chronic Kidney Disease, Gastritis, Food Poisoning, Irritable Bowel Syndrome (IBS), Liver Disease, Ulcer, Migraine, Epilepsy, Stroke, Dementia, Parkinson’s Disease, Allergy, Arthritis, Anemia, Thyroid Disorder, Obesity, Depression, Anxiety, Dermatitis, Sinusitis, Bronchitis.

    🎯 Possible Use Cases

    Multi-class disease prediction Symptom pattern analysis Clinical decision support prototypes NLP-based text classification Educational and academic projects

    🔐 License

    This dataset is released under CC0 Public Domain, meaning it is free to use, modify, and share without restrictions.

  14. National Health & Nutrition Exam Survey 2017-2018

    • kaggle.com
    zip
    Updated Jan 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Riley Zurrin (2024). National Health & Nutrition Exam Survey 2017-2018 [Dataset]. https://www.kaggle.com/rileyzurrin/national-health-and-nutrition-exam-survey-2017-2018
    Explore at:
    zip(12252608 bytes)Available download formats
    Dataset updated
    Jan 12, 2024
    Authors
    Riley Zurrin
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    As of January 2024, this is the most recent NHANES dataset whose data collection was not affected by COVID-19.

    Context

    The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. NHANES is a major program of the National Center for Health Statistics (NCHS). NCHS is part of the Centers for Disease Control and Prevention (CDC) and has the responsibility for producing vital and health statistics for the Nation.

    The NHANES program began in the early 1960s and has been conducted as a series of surveys focusing on different population groups or health topics. In 1999, the survey became a continuous program that has a changing focus on a variety of health and nutrition measurements to meet emerging needs. The survey examines a nationally representative sample of about 5,000 persons each year. These persons are located in counties across the country, 15 of which are visited each year.

    The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel.

    To date, thousands of research findings have been published using the NHANES data.

    Content

    The 2017-2018 NHANES datasets include the following components:

    1. Demographics dataset:

    • A complete variable dictionary can be found here

    2. Examinations dataset, which contains factors like:

    • Blood pressure

    • Body measures

    • Muscle strength - grip test

    • Oral health - dentition

    • Taste & smell

    • A complete variable dictionary can be found here

    3. Dietary data - total nutrient intake:

    • A complete variable dictionary can be found here

    4. Laboratory dataset, which includes factors like:

    • Albumin & Creatinine - Urine

    • Apolipoprotein B

    • Blood Lead, Cadmium, Total Mercury, Selenium, and Manganese

    • Blood mercury: inorganic, ethyl and methyl

    • Cholesterol - HDL

    • Cholesterol - LDL & Triglycerides

    • Cholesterol - Total

    • Complete Blood Count with 5-part Differential - Whole Blood

    • Copper, Selenium & Zinc - Serum

    • Fasting Questionnaire

    • Fluoride - Plasma

    • Fluoride - Water

    • Glycohemoglobin

    • Hepatitis A

    • Hepatitis B Surface Antibody

    • Hepatitis B: core antibody, surface antigen, and Hepatitis D antibody

    • Hepatitis C RNA (HCV-RNA) and Hepatitis C Genotype

    • Hepatitis E: IgG & IgM Antibodies

    • Herpes Simplex Virus Type-1 & Type-2

    • HIV Antibody Test

    • Human Papillomavirus (HPV) - Oral Rinse

    • Human Papillomavirus (HPV) DNA - Vaginal Swab: Roche Cobas & Roche Linear Array

    • Human Papillomavirus (HPV) DNA Results from Penile Swab Samples: Roche Linear Array

    • Insulin

    • Iodine - Urine

    • Perchlorate, Nitrate & Thiocyanate - Urine

    • Perfluoroalkyl and Polyfluoroalkyl Substances (formerly Polyfluoroalkyl Chemicals - PFC)

    • Personal Care and Consumer Product Chemicals and Metabolites

    • Phthalates and Plasticizers Metabolites - Urine

    • Plasma Fasting Glucose

    • Polycyclic Aromatic Hydrocarbons (PAH) - Urine

    • Standard Biochemistry Profile

    • Tissue Transglutaminase Assay (IgA-TTG) & IgA Endomyseal Antibody Assay (IgA EMA)

    • Trichomonas - Urine

    • Two-hour Oral Glucose Tolerance Test

    • Urinary Chlamydia

    • Urinary Mercury

    • Urinary Speciated Arsenics

    • Urinary Total Arsenic

    • Urine Flow Rate

    • Urine Metals

    • Urine Pregnancy Test

    • Vitamin B12

    • A complete variable dictionary can be found here

    5. Questionnaire dataset, which includes items like:

    • Acculturation

    • Alcohol Use

    • Blood Pressure & Cholesterol

    • Cardiovascular Health

    • Consumer Behavior

    • Current Health Status

    • Dermatology

    • Diabetes

    • Diet Behavior & Nutrition

    • Disability

    • Drug Use

    • Early Childhood

    • Food Security

    • Health Insurance

    • Hepatitis

    • Hospital Utilization & Access to Care

    • Housing Characteristics

    • Immunization

    • Income

    • Medical Conditions

    • Mental Health - Depression Screener

    • Occupation

    • Oral Health

    • Osteoporosis

    • Pesticide Use

    • Physical Activity

    • Physical Functioning

    • Preventive Aspirin Us...

  15. Global Country Information Dataset 2023

    • kaggle.com
    zip
    Updated Jul 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
    Explore at:
    zip(24063 bytes)Available download formats
    Dataset updated
    Jul 8, 2023
    Authors
    Nidula Elgiriyewithana ⚡
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

    DOI

    Key Features

    • Country: Name of the country.
    • Density (P/Km2): Population density measured in persons per square kilometer.
    • Abbreviation: Abbreviation or code representing the country.
    • Agricultural Land (%): Percentage of land area used for agricultural purposes.
    • Land Area (Km2): Total land area of the country in square kilometers.
    • Armed Forces Size: Size of the armed forces in the country.
    • Birth Rate: Number of births per 1,000 population per year.
    • Calling Code: International calling code for the country.
    • Capital/Major City: Name of the capital or major city.
    • CO2 Emissions: Carbon dioxide emissions in tons.
    • CPI: Consumer Price Index, a measure of inflation and purchasing power.
    • CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
    • Currency_Code: Currency code used in the country.
    • Fertility Rate: Average number of children born to a woman during her lifetime.
    • Forested Area (%): Percentage of land area covered by forests.
    • Gasoline_Price: Price of gasoline per liter in local currency.
    • GDP: Gross Domestic Product, the total value of goods and services produced in the country.
    • Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
    • Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
    • Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
    • Largest City: Name of the country's largest city.
    • Life Expectancy: Average number of years a newborn is expected to live.
    • Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
    • Minimum Wage: Minimum wage level in local currency.
    • Official Language: Official language(s) spoken in the country.
    • Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
    • Physicians per Thousand: Number of physicians per thousand people.
    • Population: Total population of the country.
    • Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
    • Tax Revenue (%): Tax revenue as a percentage of GDP.
    • Total Tax Rate: Overall tax burden as a percentage of commercial profits.
    • Unemployment Rate: Percentage of the labor force that is unemployed.
    • Urban Population: Percentage of the population living in urban areas.
    • Latitude: Latitude coordinate of the country's location.
    • Longitude: Longitude coordinate of the country's location.

    Potential Use Cases

    • Analyze population density and land area to study spatial distribution patterns.
    • Investigate the relationship between agricultural land and food security.
    • Examine carbon dioxide emissions and their impact on climate change.
    • Explore correlations between economic indicators such as GDP and various socio-economic factors.
    • Investigate educational enrollment rates and their implications for human capital development.
    • Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
    • Study labor market dynamics through indicators such as labor force participation and unemployment rates.
    • Investigate the role of taxation and its impact on economic development.
    • Explore urbanization trends and their social and environmental consequences.

    Data Source: This dataset was compiled from multiple data sources

    If this was helpful, a vote is appreciated ❤️ Thank you 🙂

  16. Infectious Disease Prediction

    • kaggle.com
    zip
    Updated Jul 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haithem Hermassi (2020). Infectious Disease Prediction [Dataset]. https://www.kaggle.com/haithemhermessi/infectious-disease-prediction
    Explore at:
    zip(1804291 bytes)Available download formats
    Dataset updated
    Jul 14, 2020
    Authors
    Haithem Hermassi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    These data contain counts and rates for Centers for Infectious Diseases-related disease cases among California residents by county, disease, sex, and year spanning 2001-2014 (As of September, 2015). Data were extracted on communicable disease cases with an estimated onset or diagnosis date from 2001 through 2014 from California Confidential Morbidity Reports and/or Laboratory Report that were submitted to CDPH by September 2015 and which met the surveillance case definition for that disease. A cleansing and exploration steps have been performed to generate the train and test datasets.

    Content

    The train dataset contains 75614 rows and the test data has 18904 rows ****Features:**** ****Disease****:Plain text: The name of the disease reported for the patient. ****County****: Plain text "The county in which the case resided when they were diagnosed and/or where they are currently receiving care; in most cases this will be the county that reported the case.
    ****Year ****:Number: Year is derived from the estimated illness onset date. We defined the estimated illness onset date for each case as the date closest to the time when symptoms first appeared. Because date of illness onset may not be recorded, the estimated date of illness onset can range from the first appearance of symptoms to the date the report was made to CDPH. For diseases with insidious illness onset (for instance, coccidioidomycosis), estimated illness onset was more frequently drawn from the diagnosis date Values include: years spanning 2001-2014, unless otherwise indicated below ****Sex ****:Plain text : Values include: Male, Female, **Count **:Number: The number of occurrences of each disease that meet the surveillance definition and/or inclusion criteria specific to that disease for that County, Year, Sex strata. National surveillance case definitions for these conditions can be found at http://wwwn.cdc.gov/nndss/case-definitions.html. ****Population ****:Number: The estimated population size (rounded to the nearest integer) for each County, Year, Sex strata. California Department of Finance (DOF) Population Projection data (P-3 data table) were used to determine the population proportion of a particular demographic subgroup relative to the total State/County population for a given year. These proportions were then applied to the DOF Estimate totals (E-2 data table) for the given State/County and year total, to obtain the estimates used. These data are available at http://www.dof.ca.gov/research/demographic/reports/view.php. Value: a number (a positive integer)" ****Rate ****:Number:The rate of disease per 100,000 population for the corresponding County, Year, Sex strata using the standard calculation (Count *100,000/Population) Value: a number (a positive real number xxx.xxx)" ****CI.lower****:Number: The lower bound of the 95% confidence interval for the calculated rate. The confidence interval was calculated with the R software package (R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.) using the ""Exact Pearson-Klopper"" method as implement in the ""binom"" package (Sundar Dorai-Raj (2014). binom: Binomial Confidence Intervals For Several Parameterizations. R package version 1.1-1. http://CRAN.R-project.org/package=binom) Value: a number (a positive real number xxx.xxx)" ****CI.uppe**r**:Number:The upper bound of the 95% confidence interval for the calculated rate, calculated as above. Value: a number (a positive real number xxx.xxx)"

    Acknowledgements

  17. Patient Risk Profiles

    • kaggle.com
    zip
    Updated Oct 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujay Kapadnis (2023). Patient Risk Profiles [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/patient-risk-profiles
    Explore at:
    zip(17288 bytes)Available download formats
    Dataset updated
    Oct 28, 2023
    Authors
    Sujay Kapadnis
    Description

    The virtual R/Pharma Conference is happening this week! To celebrate, we're exploring Patient Risk Profiles. Thank you to Jenna Reps for preparing this data!

    This dataset contains 100 simulated patient's medical history features and the predicted 1-year risk of 14 outcomes based on each patient's medical history features. The predictions used real logistic regression models developed on a large real world healthcare dataset.

    Data Dictionary

    patient_risk_profiles.csv

    variableclassdescription
    personIdintegerA unique identifier for the simulated patient
    age group: 10 - 14integerA binary column where 1 means the patient is aged between 10-14 (inclusive) and 0 means the patient is not in that age group
    age group: 15 - 19integerA binary column where 1 means the patient is aged between 15-19 (inclusive) and 0 means the patient is not in that age group
    age group: 20 - 24integerA binary column where 1 means the patient is aged between 20-24 (inclusive) and 0 means the patient is not in that age group
    age group: 65 - 69integerA binary column where 1 means the patient is aged between 65-69 (inclusive) and 0 means the patient is not in that age group
    age group: 40 - 44integerA binary column where 1 means the patient is aged between 40-44 (inclusive) and 0 means the patient is not in that age group
    age group: 45 - 49integerA binary column where 1 means the patient is aged between 45-49 (inclusive) and 0 means the patient is not in that age group
    age group: 55 - 59integerA binary column where 1 means the patient is aged between 55-59 (inclusive) and 0 means the patient is not in that age group
    age group: 85 - 89integerA binary column where 1 means the patient is aged between 85-89 (inclusive) and 0 means the patient is not in that age group
    age group: 75 - 79integerA binary column where 1 means the patient is aged between 75-79 (inclusive) and 0 means the patient is not in that age group
    age group: 5 - 9integerA binary column where 1 means the patient is aged between 5-9 (inclusive) and 0 means the patient is not in that age group
    age group: 25 - 29integerA binary column where 1 means the patient is aged between 25-29 (inclusive) and 0 means the patient is not in that age group
    age group: 0 - 4integerA binary column where 1 means the patient is aged between 0-4 (inclusive) and 0 means the patient is not in that age group
    age group: 70 - 74integerA binary column where 1 means the patient is aged between 70-74 (inclusive) and 0 means the patient is not in that age group
    age group: 50 - 54integerA binary column where 1 means the patient is aged between 50-54 (inclusive) and 0 means the patient is not in that age group
    age group: 60 - 64integerA binary column where 1 means the patient is aged between 60-64 (inclusive) and 0 means the patient is not in that age group
    age group: 35 - 39integerA binary column where 1 means the patient is aged between 35-39 (inclusive) and 0 means the patient is not in that age group
    age group: 30 - 34integerA binary column where 1 means the patient is aged between 30-34 (inclusive) and 0 means the patient is not in that age group
    age group: 80 - 84integerA binary column where 1 means the patient is aged between 80-84 (inclusive) and 0 means the patient is not in that age group
    age group: 90 - 94integerA binary column where 1 means the patient is aged between 90-94 (inclusive) and 0 means the patient is not in that age group
    Sex = FEMALEintegerA binary column where 1 means the patient has a female sex
    sex = MALEintegerA binary column where 1 means the patient has a male sex
    Acetaminophen exposures in prior yearintegerA binary column where 1 means the patient had a record for acetaminophen in the prior year and 0 means they did not
    Occurrence of Alcoholism in prior yearintegerA binary column where 1 means the patient had a record for alcoholism in the prior year and 0 means they did not
    Anemia in prior yearintegerA binary column where 1 means the patient had a record for anemia in the prior year and 0 means they did not
    Angina events in prior yearintegerA binary column where 1 means the patient had a record for angina in the prior year and 0 means they did not
    ANTIEPILEPTICS in prior yearintegerA binary column where 1 means the patient had a record for a drug in the category ANTIEPILEPTICS in the prior year and 0 means they did not
    Occurrence of Anxiety in prior yearintegerA binary column where 1 means the patient had a record for anxiety in the prior year and 0 means...
  18. COVID-19 Dashboard

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Oct 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). COVID-19 Dashboard [Dataset]. https://catalog.data.gov/dataset/covid-19-dashboard
    Explore at:
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    Description

    The dashboard is updated each Friday. Laboratory surveillance data: California laboratories report SARS-CoV-2 test results to CDPH through electronic laboratory reporting. Los Angeles County SARS-CoV-2 lab data has a 7-day reporting lag. Test positivity is calculated using SARS-CoV-2 lab tests that has a specimen collection date reported during a given week. Specimens for testing are collected from patients in healthcare settings and do not reflect all testing for COVID-19 in California. Test positivity for a given week is calculated by dividing the number of positive COVID-19 results by the total number of specimens tested for that virus. Weekly laboratory surveillance data are defined as Sunday through Saturday. Hospitalization data: Data on COVID-19 and influenza hospital admissions are from Centers for Disease Control and Prevention’s (CDC) National Healthcare Safety Network (NHSN) Hospitalization dataset. The requirement to report COVID-19-associated hospitalizations was effective November 1, 2024. CDPH pulls NHSN data from the CDC on the Wednesday prior to the publication of the report. Results may differ depending on which day data are pulled. Admission rates are calculated using population estimates from the P-3: Complete State and County Projections Dataset (https://dof.ca.gov/forecasting/demographics/projections/) provided by the State of California Department of Finance. Reported weekly admission rates for the entire season use the population estimates for the year the season started. For more information on NHSN data including the protocol and data collection information, see the CDC NHSN webpage (https://www.cdc.gov/nhsn/index.html). Weekly hospitalization data are defined as Sunday through Saturday. Death certificate data: CDPH receives weekly year-to-date dynamic data on deaths occurring in California from the CDPH Center for Health Statistics and Informatics. These data are limited to deaths occurring among California residents and are analyzed to identify COVID-19-coded deaths. These deaths are not necessarily laboratory-confirmed and are an underestimate of all COVID-19-associated deaths in California. Weekly death data are defined as Sunday through Saturday.

  19. d

    ARCHIVED: COVID-19 Cases and Deaths Summarized by Geography

    • catalog.data.gov
    Updated Mar 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.sfgov.org (2025). ARCHIVED: COVID-19 Cases and Deaths Summarized by Geography [Dataset]. https://catalog.data.gov/dataset/covid-19-cases-and-deaths-summarized-by-geography
    Explore at:
    Dataset updated
    Mar 29, 2025
    Dataset provided by
    data.sfgov.org
    Description

    A. SUMMARY Medical provider confirmed COVID-19 cases and confirmed COVID-19 related deaths in San Francisco, CA aggregated by several different geographic areas and normalized by 2016-2020 American Community Survey (ACS) 5-year estimates for population data to calculate rate per 10,000 residents. On September 12, 2021, a new case definition of COVID-19 was introduced that includes criteria for enumerating new infections after previous probable or confirmed infections (also known as reinfections). A reinfection is defined as a confirmed positive PCR lab test more than 90 days after a positive PCR or antigen test. The first reinfection case was identified on December 7, 2021. Cases and deaths are both mapped to the residence of the individual, not to where they were infected or died. For example, if one was infected in San Francisco at work but lives in the East Bay, those are not counted as SF Cases or if one dies in Zuckerberg San Francisco General but is from another county, that is also not counted in this dataset. Dataset is cumulative and covers cases going back to 3/2/2020 when testing began. Geographic areas summarized are: 1. Analysis Neighborhoods 2. Census Tracts 3. Census Zip Code Tabulation Areas B. HOW THE DATASET IS CREATED Addresses from medical data are geocoded by the San Francisco Department of Public Health (SFDPH). Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area. The 2016-2020 American Community Survey (ACS) population estimates provided by the Census are used to create a rate which is equal to ([count] / [acs_population]) * 10000) representing the number of cases per 10,000 residents. C. UPDATE PROCESS Geographic analysis is scripted by SFDPH staff and synced to this dataset daily at 7:30 Pacific Time. D. HOW TO USE THIS DATASET San Francisco population estimates for geographic regions can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS). Privacy rules in effect To protect privacy, certain rules are in effect: 1. Case counts greater than 0 and less than 10 are dropped - these will be null (blank) values 2. Death counts greater than 0 and less than 10 are dropped - these will be null (blank) values 3. Cases and deaths dropped altogether for areas where acs_population < 1000 Rate suppression in effect where counts lower than 20 Rates are not calculated unless the case count is greater than or equal to 20. Rates are generally unstable at small numbers, so we avoid calculating them directly. We advise you to apply the same approach as this is best practice in epidemiology. A note on Census ZIP Code Tabulation Areas (ZCTAs) ZIP Code Tabulation Areas are special boundaries created by the U.S. Census based on ZIP Codes developed by the USPS. They are not, however, the same thing. ZCTAs are areal representations of routes. Read how the Census develops ZCTAs on their website. Row included for Citywide case counts, incidence rate, and deaths A single row is included that has the Citywide case counts and incidence rate. This can be used for comparisons. Citywide will capture all cases regardless of address quality. While some cases cannot be mapped to sub-areas like Census Tracts, ongo

  20. N

    Medicine Park, OK Population Pyramid Dataset: Age Groups, Male and Female...

    • neilsberg.com
    csv, json
    Updated Sep 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). Medicine Park, OK Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis [Dataset]. https://www.neilsberg.com/research/datasets/62e48990-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Sep 16, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Medicine Park, Oklahoma
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the data for the Medicine Park, OK population pyramid, which represents the Medicine Park population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey 5-Year estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

    Key observations

    • Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for Medicine Park, OK, is 19.1.
    • Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for Medicine Park, OK, is 18.2.
    • Total dependency ratio for Medicine Park, OK is 37.3.
    • Potential support ratio, which is the number of youth (working age population) per elderly, for Medicine Park, OK is 5.5.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group for the Medicine Park population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the Medicine Park for the selected age group is shown in the following column.
    • Population (Female): The female population in the Medicine Park for the selected age group is shown in the following column.
    • Total Population: The total population of the Medicine Park for the selected age group is shown in the following column.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Medicine Park Population by Age. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Department of Health Care Access and Information (2024). Medical Service Study Areas [Dataset]. https://data.chhs.ca.gov/dataset/medical-service-study-areas
Organization logo

Medical Service Study Areas

Explore at:
csv, html, geojson, kml, zip, arcgis geoservices rest apiAvailable download formats
Dataset updated
Dec 6, 2024
Dataset authored and provided by
Department of Health Care Access and Information
Description
This is the current Medical Service Study Area. California Medical Service Study Areas are created by the California Department of Health Care Access and Information (HCAI).

Check the Data Dictionary for field descriptions.


Checkout the California Healthcare Atlas for more Medical Service Study Area information.

This is an update to the MSSA geometries and demographics to reflect the new 2020 Census tract data. The Medical Service Study Area (MSSA) polygon layer represents the best fit mapping of all new 2020 California census tract boundaries to the original 2010 census tract boundaries used in the construction of the original 2010 MSSA file. Each of the state's new 9,129 census tracts was assigned to one of the previously established medical service study areas (excluding tracts with no land area), as identified in this data layer. The MSSA Census tract data is aggregated by HCAI, to create this MSSA data layer. This represents the final re-mapping of 2020 Census tracts to the original 2010 MSSA geometries. The 2010 MSSA were based on U.S. Census 2010 data and public meetings held throughout California.


<a href="https://hcai.ca.gov/">https://hcai.ca.gov/</a>

Source of update: American Community Survey 5-year 2006-2010 data for poverty. For source tables refer to InfoUSA update procedural documentation. The 2010 MSSA Detail layer was developed to update fields affected by population change. The American Community Survey 5-year 2006-2010 population data pertaining to total, in households, race, ethnicity, age, and poverty was used in the update. The 2010 MSSA Census Tract Detail map layer was developed to support geographic information systems (GIS) applications, representing 2010 census tract geography that is the foundation of 2010 medical service study area (MSSA) boundaries. ***This version is the finalized MSSA reconfiguration boundaries based on the US Census Bureau 2010 Census. In 1976 Garamendi Rural Health Services Act, required the development of a geographic framework for determining which parts of the state were rural and which were urban, and for determining which parts of counties and cities had adequate health care resources and which were "medically underserved". Thus, sub-city and sub-county geographic units called "medical service study areas [MSSAs]" were developed, using combinations of census-defined geographic units, established following General Rules promulgated by a statutory commission. After each subsequent census the MSSAs were revised. In the scheduled revisions that followed the 1990 census, community meetings of stakeholders (including county officials, and representatives of hospitals and community health centers) were held in larger metropolitan areas. The meetings were designed to develop consensus as how to draw the sub-city units so as to best display health care disparities. The importance of involving stakeholders was heightened in 1992 when the United States Department of Health and Human Services' Health and Resources Administration entered a formal agreement to recognize the state-determined MSSAs as "rational service areas" for federal recognition of "health professional shortage areas" and "medically underserved areas". After the 2000 census, two innovations transformed the process, and set the stage for GIS to emerge as a major factor in health care resource planning in California. First, the Office of Statewide Health Planning and Development [OSHPD], which organizes the community stakeholder meetings and provides the staff to administer the MSSAs, entered into an Enterprise GIS contract. Second, OSHPD authorized at least one community meeting to be held in each of the 58 counties, a significant number of which were wholly rural or frontier counties. For populous Los Angeles County, 11 community meetings were held. As a result, health resource data in California are collected and organized by 541 geographic units. The boundaries of these units were established by community healthcare experts, with the objective of maximizing their usefulness for needs assessment purposes. The most dramatic consequence was introducing a data simultaneously displayed in a GIS format. A two-person team, incorporating healthcare policy and GIS expertise, conducted the series of meetings, and supervised the development of the 2000-census configuration of the MSSAs.

MSSA Configuration Guidelines (General Rules):- Each MSSA is composed of one or more complete census tracts.- As a general rule, MSSAs are deemed to be "rational service areas [RSAs]" for purposes of designating health professional shortage areas [HPSAs], medically underserved areas [MUAs] or medically underserved populations [MUPs].- MSSAs will not cross county lines.- To the extent practicable, all census-defined places within the MSSA are within 30 minutes travel time to the largest population center within the MSSA, except in those circumstances where meeting this criterion would require splitting a census tract.- To the extent practicable, areas that, standing alone, would meet both the definition of an MSSA and a Rural MSSA, should not be a part of an Urban MSSA.- Any Urban MSSA whose population exceeds 200,000 shall be divided into two or more Urban MSSA Subdivisions.- Urban MSSA Subdivisions should be within a population range of 75,000 to 125,000, but may not be smaller than five square miles in area. If removing any census tract on the perimeter of the Urban MSSA Subdivision would cause the area to fall below five square miles in area, then the population of the Urban MSSA may exceed 125,000. - To the extent practicable, Urban MSSA Subdivisions should reflect recognized community and neighborhood boundaries and take into account such demographic information as income level and ethnicity. Rural Definitions: A rural MSSA is an MSSA adopted by the Commission, which has a population density of less than 250 persons per square mile, and which has no census defined place within the area with a population in excess of 50,000. Only the population that is located within the MSSA is counted in determining the population of the census defined place. A frontier MSSA is a rural MSSA adopted by the Commission which has a population density of less than 11 persons per square mile. Any MSSA which is not a rural or frontier MSSA is an urban MSSA. Last updated December 6th 2024.
Search
Clear search
Close search
Google apps
Main menu