100+ datasets found
  1. Health Insurance Marketplace

    • kaggle.com
    zip
    Updated May 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Department of Health and Human Services (2017). Health Insurance Marketplace [Dataset]. https://www.kaggle.com/datasets/hhs/health-insurance-marketplace
    Explore at:
    zip(868821924 bytes)Available download formats
    Dataset updated
    May 1, 2017
    Dataset provided by
    United States Department of Health and Human Serviceshttp://www.hhs.gov/
    Authors
    US Department of Health and Human Services
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.

    median plan premiums

    Exploration Ideas

    To help get you started, here are some data exploration ideas:

    • How do plan rates and benefits vary across states?
    • How do plan benefits relate to plan rates?
    • How do plan rates vary by age?
    • How do plans vary across insurance network providers?

    See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!

    Data Description

    This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.

    Here, we've processed the data to facilitate analytics. This processed version has three components:

    1. Original versions of the data

    The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.

    2. Combined CSV files that contain

    In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:

    • BenefitsCostSharing.csv
    • BusinessRules.csv
    • Network.csv
    • PlanAttributes.csv
    • Rate.csv
    • ServiceArea.csv

    Additionally, there are two CSV files that facilitate joining data across years:

    • Crosswalk2015.csv - joining 2014 and 2015 data
    • Crosswalk2016.csv - joining 2015 and 2016 data

    3. SQLite database

    The "database.sqlite" file contains tables corresponding to each of the processed CSV files.

    The code to create the processed version of this data is available on GitHub.

  2. Z

    Health Insurance Claims

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gideon, Gideon (2024). Health Insurance Claims [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13289813
    Explore at:
    Dataset updated
    Aug 10, 2024
    Dataset authored and provided by
    Gideon, Gideon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is eligible in exploring Health Insurance fraud Claims using machine learning algorithms. Its well suited for students developimg ML models to predict Healthcare insurance claims fraud.

  3. c

    Data from: Insurance Claim Dataset

    • cubig.ai
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Insurance Claim Dataset [Dataset]. https://cubig.ai/store/products/540/insurance-claim-dataset
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Insurance Claim Dataset is a tabular dataset collected to predict whether an insurance claim will be made (yes/no) based on information such as the policyholder’s age, gender, BMI, average daily steps, number of children, smoking status, residential region, and medical charges billed by health insurance.

    2) Data Utilization (1) Characteristics of the Insurance Claim Dataset: • The dataset integrates various factors such as health status, lifestyle habits, and demographic characteristics, making it suitable for practical use in insurance risk prediction and customer segmentation.

    (2) Applications of the Insurance Claim Dataset: • Development of Insurance Claim Prediction Models: The dataset can be used to develop machine learning models that classify whether an insurance claim will be filed based on multiple input features. • Insurance Product Development and Risk Assessment: By analyzing the probability of claims for different customer profiles, the dataset can be used for product design, risk management, and premium pricing in practical policy planning.

  4. Sample Insurance Claim Prediction Dataset

    • kaggle.com
    Updated Jun 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eason (2018). Sample Insurance Claim Prediction Dataset [Dataset]. https://www.kaggle.com/easonlai/sample-insurance-claim-prediction-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 4, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Eason
    Description

    Content

    This is "Sample Insurance Claim Prediction Dataset" which based on "[Medical Cost Personal Datasets][1]" to update sample value on top.

    age : age of policyholder sex: gender of policy holder (female=0, male=1) bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 25 steps: average walking steps per day of policyholder children: number of children / dependents of policyholder smoker: smoking state of policyholder (non-smoke=0;smoker=1) region: the residential area of policyholder in the US (northeast=0, northwest=1, southeast=2, southwest=3) charges: individual medical costs billed by health insurance insuranceclaim: yes=1, no=0

  5. m

    Health Insurance Dataset

    • data.mendeley.com
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prakash M C (2024). Health Insurance Dataset [Dataset]. http://doi.org/10.17632/jx5tddtcs6.1
    Explore at:
    Dataset updated
    Jul 16, 2024
    Authors
    Prakash M C
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains information about more than 1300 beneficiaries

  6. Health Insurance data: Policy and claims data

    • kaggle.com
    Updated Sep 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnkushAgarwal (2020). Health Insurance data: Policy and claims data [Dataset]. https://www.kaggle.com/ankush89/health-insurance-data-policy-and-claims-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 22, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    AnkushAgarwal
    Description

    Dataset

    This dataset was created by AnkushAgarwal

    Contents

  7. Health Insurance Dataset

    • kaggle.com
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamadreza Momeni (2025). Health Insurance Dataset [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/health-insurance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2025
    Dataset provided by
    Kaggle
    Authors
    Mohamadreza Momeni
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Medical Insurance Expenses & Premium Dataset

    This dataset captures demographic and financial information related to medical insurance policyholders. It includes key features such as age, gender, BMI, number of children, discount eligibility status, and the geographic region of the insured. The dataset also provides the actual medical expenses incurred (expenses) and the insurance premium charged (premium).

    The purpose of this dataset is to support research and development of machine learning models for predicting healthcare costs, optimizing pricing strategies, and understanding factors that influence insurance expenses and premiums.

    Columns

    age: Age of the policyholder

    gender: Gender (male/female)

    bmi: Body Mass Index

    children: Number of children covered by the insurance

    discount_eligibility: Whether the policyholder is eligible for a discount (yes/no)

    region: Geographic region (e.g., southeast, northwest)

    expenses: Actual medical costs incurred by the policyholder (Target number 1)

    premium: Insurance premium charged (Target number 2)

    Example Use Cases

    Predicting insurance expenses for new applicants

    Analyzing which demographic factors contribute most to higher premiums

    Exploring correlations between BMI, age, and healthcare costs

    Developing regression and classification models for pricing optimization

  8. Dataset for "Public health insurance coverage in India before and after...

    • figshare.com
    bin
    Updated Aug 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanjay K Mohanty; Ashish Kumar Upadhyay; Suraj Maiti; Radhe Shyam Mishra; Fabrice Kämpfen; Jürgen Maurer; Owen O'Donell (2023). Dataset for "Public health insurance coverage in India before and after PM-JAY: repeated cross-sectional analysis of nationally representative survey data" [Dataset]. http://doi.org/10.6084/m9.figshare.23919078.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 10, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sanjay K Mohanty; Ashish Kumar Upadhyay; Suraj Maiti; Radhe Shyam Mishra; Fabrice Kämpfen; Jürgen Maurer; Owen O'Donell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    Public health insurance coverage in India before and after PM-JAY: repeated cross-sectional analysis of nationally representative survey dataThe National Family Health Survey (NFHS), India data is publicly available data set and can be accessed on request. It can be downloaded upon registration from the Demographic and Health Survey (DHS) website upon registration at The DHS Program - Request Access To Datasets. We have used data from the fourth and fifth round of NFHS, which can be accessed after registration from the link given here for NFHS 4 and NFHS 5 https://dhsprogram.com/data/dataset/India_Standard-DHS_2015.cfm?flag=0 and here https://dhsprogram.com/data/dataset/India_Standard-DHS_2020.cfm?flag=0 respectively. These datasets (HR file) have been used to obtain this combined dataset of a paper entitled "Public health insurance coverage in India before and after PM-JAY: repeated cross-sectional analysis of nationally representative survey data" submitted to BMJ Global Health August 2023.

  9. c

    Health Insurance Coverage - Datasets - CTData.org

    • data.ctdata.org
    Updated Mar 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Health Insurance Coverage - Datasets - CTData.org [Dataset]. http://data.ctdata.org/dataset/health-insurance-coverage
    Explore at:
    Dataset updated
    Mar 16, 2016
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Health Insurance Coverage reports the prevalance of Health Insurance coverage disaggregated by age group.

  10. m

    Dataset of health insurance portfolio

    • data.mendeley.com
    Updated Dec 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josep Lledó (2024). Dataset of health insurance portfolio [Dataset]. http://doi.org/10.17632/386vmj2tbk.1
    Explore at:
    Dataset updated
    Dec 6, 2024
    Authors
    Josep Lledó
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data is formatted as a spreadsheet, encompassing the primary activities over a span of three full years (2017, 2018 and 2019) concerning non-life health insurance portfolio. This dataset comprises 228,711 rows and 42 columns. Each row signifies a insured (individual) policy, while each column represents a distinct variable.

  11. G

    Insurance Premium and Claims Data by Class of Insurance, Alberta, 2013

    • open.canada.ca
    • data.wu.ac.at
    csv, html, xlsx
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Alberta (2024). Insurance Premium and Claims Data by Class of Insurance, Alberta, 2013 [Dataset]. https://open.canada.ca/data/en/dataset/34eb85a2-1558-46b7-adca-a40c446cb05f
    Explore at:
    xlsx, csv, htmlAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    Government of Alberta
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Jan 1, 2013 - Dec 31, 2013
    Area covered
    Alberta
    Description

    Data provided by insurers, on the premiums written and claims incurred for the 2013 fiscal year. Based on reporting on the consolidated pages of the P&C-1 or Life-1 Annual returns. This data is also reported in the Superintendent of Insurance’s Annual Report.

  12. Commercial Medical Insurance (MSCANCC) - Vision and Eye Health Surveillance

    • catalog.data.gov
    • data.virginia.gov
    • +4more
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). Commercial Medical Insurance (MSCANCC) - Vision and Eye Health Surveillance [Dataset]. https://catalog.data.gov/dataset/commercial-medical-insurance-mscancc-vision-and-eye-health-surveillance
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Description

    This dataset is a de-identified summary table of prevalence rates for vision and eye health data indicators from the 2016 MarketScan® Commercial Claims and Encounters Data (CCAE) is produced by Truven Health Analytics, a division of IBM Watson Health. The CCEA data contain a convenience sample of insurance claims information from person with employer-sponsored insurance and their dependents, including 43.6 million person years of data. Prevalence estimates are stratified by all available combinations of age group, gender, and state. Detailed information on VEHSS MarketScan analyses can be found on the VEHSS MarketScan webpage (cdc.gov/visionhealth/vehss/data/claims/marketscan.html). Information on available Medicare claims data can be found on the IBM MarketScan website (https://marketscan.truvenhealth.com). The VEHSS MarketScan summary dataset was last updated November 2019.

  13. Health Care Insurance Report Type Codes

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Health Care Insurance Report Type Codes [Dataset]. https://www.johnsnowlabs.com/marketplace/health-care-insurance-report-type-codes/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    United States
    Description

    Healthcare Insurance Report Type Codes is a dataset that defines the type of report being described in an insurance claim and are transmitted in 005010X306, loop 2300, REF03. This dataset also contains information on the different report type codes and their descriptions, start and modified dates, and the status of each code whether active, to be deactivated or deactivated.

  14. Medicaid Claims (MAX) - Vision and Eye Health Surveillance

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated May 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). Medicaid Claims (MAX) - Vision and Eye Health Surveillance [Dataset]. https://catalog.data.gov/dataset/medicaid-claims-max-vision-and-eye-health-surveillance
    Explore at:
    Dataset updated
    May 16, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Description

    2016-2019. This dataset is a de-identified summary table of prevalence rates for vision and eye health data indicators from the Medicaid Analytic eXtract (MAX) data. Medicaid MAX are a set of de-identified person-level data files with information on Medicaid eligibility, service utilization, diagnoses, and payments. The MAX data contain a convenience sample of claims processed by Medicaid and Children’s Health Insurance Program (CHIP) fee for service and managed care plans. Not all states are included in MAX in all years, and as of November 2019, 2014 data is the latest available. Prevalence estimates are stratified by all available combinations of age group, gender, and state. Detailed information on VEHSS Medicare analyses can be found on the VEHSS Medicaid MAX webpage (cdc.gov/visionhealth/vehss/data/claims/medicaid.html). Information on available Medicare claims data can be found on the ResDac website (www.resdac.org). The VEHSS Medicaid MAX dataset was last updated May 2023.

  15. Synthetic Healthcare Database for Research (SyH-DR)

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Sep 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agency for Healthcare Research and Quality (2023). Synthetic Healthcare Database for Research (SyH-DR) [Dataset]. https://catalog.data.gov/dataset/synthetic-healthcare-database-for-research-syh-dr
    Explore at:
    Dataset updated
    Sep 16, 2023
    Dataset provided by
    Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
    Description

    The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.

  16. f

    Training, validation and test datasets and model files for larger US Health...

    • ufs.figshare.com
    txt
    Updated Dec 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Marthinus Blomerus (2023). Training, validation and test datasets and model files for larger US Health Insurance dataset [Dataset]. http://doi.org/10.38140/ufs.24598881.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 12, 2023
    Dataset provided by
    University of the Free State
    Authors
    Jan Marthinus Blomerus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Formats1.xlsx contains the descriptions of the columns of the following datasets: Training, validation and test datasets in combination are all the records.sens1.csv and and meansdX.csv are required for testing.

  17. f

    Health Insurance Dataset

    • figshare.com
    csv
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prakash M C (2025). Health Insurance Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28571408.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    figshare
    Authors
    Prakash M C
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains expense and premium details related to health insurance.

  18. A

    ‘US Health Insurance Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘US Health Insurance Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-us-health-insurance-dataset-8b56/068994aa/?iid=012-655&v=presentation
    Explore at:
    Dataset updated
    Nov 15, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘US Health Insurance Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/teertha/ushealthinsurancedataset on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    The venerable insurance industry is no stranger to data driven decision making. Yet in today's rapidly transforming digital landscape, Insurance is struggling to adapt and benefit from new technologies compared to other industries, even within the BFSI sphere (compared to the Banking sector for example.) Extremely complex underwriting rule-sets that are radically different in different product lines, many non-KYC environments with a lack of centralized customer information base, complex relationship with consumers in traditional risk underwriting where sometimes customer centricity runs reverse to business profit, inertia of regulatory compliance - are some of the unique challenges faced by Insurance Business.

    Despite this, emergent technologies like AI and Block Chain have brought a radical change in Insurance, and Data Analytics sits at the core of this transformation. We can identify 4 key factors behind the emergence of Analytics as a crucial part of InsurTech:

    • Big Data: The explosion of unstructured data in the form of images, videos, text, emails, social media
    • AI: The recent advances in Machine Learning and Deep Learning that can enable businesses to gain insight, do predictive analytics and build cost and time - efficient innovative solutions
    • Real time Processing: Ability of real time information processing through various data feeds (for ex. social media, news)
    • Increased Computing Power: a complex ecosystem of new analytics vendors and solutions that enable carriers to combine data sources, external insights, and advanced modeling techniques in order to glean insights that were not possible before.

    This dataset can be helpful in a simple yet illuminating study in understanding the risk underwriting in Health Insurance, the interplay of various attributes of the insured and see how they affect the insurance premium.

    Content

    This dataset contains 1338 rows of insured data, where the Insurance charges are given against the following attributes of the insured: Age, Sex, BMI, Number of Children, Smoker and Region. There are no missing or undefined values in the dataset.

    Inspiration

    This relatively simple dataset should be an excellent starting point for EDA, Statistical Analysis and Hypothesis testing and training Linear Regression models for predicting Insurance Premium Charges.

    Proposed Tasks: - Exploratory Data Analytics - Statistical hypothesis testing - Statistical Modeling - Linear Regression

    --- Original source retains full ownership of the source dataset ---

  19. A

    ‘Medical Insurance dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Medical Insurance dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-medical-insurance-dataset-b194/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Medical Insurance dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rajgupta2019/medical-insurance-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    People are always confused about their medical insurance and don't know the cost of insurance at different ages and conditions. This data is useful for these people and is useful to make predictions of the insurance cost they will have to pay.

    Content

    The data provider is unknown and all credit goes to the person. Data may not be sufficient for practical purpose and is solely for education and practice.

    Acknowledgements

    Data collection is one thing and data cleaning and preprocessing is other. The resources on YouTube is enough to learn these basics.

    Inspiration

    The KAGGLE community is very inspiring and is the best way to learn everything we need to know in Data Science and I love it.

    --- Original source retains full ownership of the source dataset ---

  20. Data from: Associations between environmental quality and adult asthma...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Associations between environmental quality and adult asthma prevalence in medical claims data [Dataset]. https://catalog.data.gov/dataset/associations-between-environmental-quality-and-adult-asthma-prevalence-in-medical-claims-d
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The MarketScan health claims database is a compilation of nearly 110 million patient records with information from more than 100 private insurance carriers and large self-insuring companies. Public forms of insurance (i.e., Medicare and Medicaid) are not included, nor are small (< 100 employees) or medium (1000 employees). We excluded the relatively few (n=6735) individuals over 65 years of age because Medicare is the primary insurance of U.S. adults over 65. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Gray, C., D. Lobdell, K. Rappazzo, Y. Jian, J. Jagai, L. Messer, A. Patel, S. Deflorio-Barker, C. Lyttle, J. Solway, and A. Rzhetsky. Associations between environmental quality and adult asthma prevalence in medical claims data. ENVIRONMENTAL RESEARCH. Elsevier B.V., Amsterdam, NETHERLANDS, 166: 529-536, (2018).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
US Department of Health and Human Services (2017). Health Insurance Marketplace [Dataset]. https://www.kaggle.com/datasets/hhs/health-insurance-marketplace
Organization logo

Health Insurance Marketplace

Explore health and dental plans data in the US Health Insurance Marketplace

Explore at:
zip(868821924 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Authors
US Department of Health and Human Services
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.

median plan premiums

Exploration Ideas

To help get you started, here are some data exploration ideas:

  • How do plan rates and benefits vary across states?
  • How do plan benefits relate to plan rates?
  • How do plan rates vary by age?
  • How do plans vary across insurance network providers?

See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!

Data Description

This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.

Here, we've processed the data to facilitate analytics. This processed version has three components:

1. Original versions of the data

The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.

2. Combined CSV files that contain

In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:

  • BenefitsCostSharing.csv
  • BusinessRules.csv
  • Network.csv
  • PlanAttributes.csv
  • Rate.csv
  • ServiceArea.csv

Additionally, there are two CSV files that facilitate joining data across years:

  • Crosswalk2015.csv - joining 2014 and 2015 data
  • Crosswalk2016.csv - joining 2015 and 2016 data

3. SQLite database

The "database.sqlite" file contains tables corresponding to each of the processed CSV files.

The code to create the processed version of this data is available on GitHub.

Search
Clear search
Close search
Google apps
Main menu