21 datasets found
  1. Health Insurance Marketplace

    • kaggle.com
    zip
    Updated May 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Department of Health and Human Services (2017). Health Insurance Marketplace [Dataset]. https://www.kaggle.com/datasets/hhs/health-insurance-marketplace
    Explore at:
    zip(868821924 bytes)Available download formats
    Dataset updated
    May 1, 2017
    Dataset provided by
    United States Department of Health and Human Serviceshttp://www.hhs.gov/
    Authors
    US Department of Health and Human Services
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.

    median plan premiums

    Exploration Ideas

    To help get you started, here are some data exploration ideas:

    • How do plan rates and benefits vary across states?
    • How do plan benefits relate to plan rates?
    • How do plan rates vary by age?
    • How do plans vary across insurance network providers?

    See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!

    Data Description

    This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.

    Here, we've processed the data to facilitate analytics. This processed version has three components:

    1. Original versions of the data

    The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.

    2. Combined CSV files that contain

    In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:

    • BenefitsCostSharing.csv
    • BusinessRules.csv
    • Network.csv
    • PlanAttributes.csv
    • Rate.csv
    • ServiceArea.csv

    Additionally, there are two CSV files that facilitate joining data across years:

    • Crosswalk2015.csv - joining 2014 and 2015 data
    • Crosswalk2016.csv - joining 2015 and 2016 data

    3. SQLite database

    The "database.sqlite" file contains tables corresponding to each of the processed CSV files.

    The code to create the processed version of this data is available on GitHub.

  2. Medical_cost_dataset

    • kaggle.com
    Updated Aug 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandita Pore (2023). Medical_cost_dataset [Dataset]. https://www.kaggle.com/datasets/nanditapore/medical-cost-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nandita Pore
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description:

    Explore the intricacies of medical costs and healthcare expenses with our meticulously curated Medical Cost Dataset. This dataset offers valuable insights into the factors influencing medical charges, enabling researchers, analysts, and healthcare professionals to gain a deeper understanding of the dynamics within the healthcare industry.

    Columns: 1. ID: A unique identifier assigned to each individual record, facilitating efficient data management and analysis. 2. Age: The age of the patient, providing a crucial demographic factor that often correlates with medical expenses. 3. Sex: The gender of the patient, offering insights into potential cost variations based on biological differences. 4. BMI: The Body Mass Index (BMI) of the patient, indicating the relative weight status and its potential impact on healthcare costs. 5. Children: The number of children or dependents covered under the medical insurance, influencing family-related medical expenses. 6. Smoker: A binary indicator of whether the patient is a smoker or not, as smoking habits can significantly impact healthcare costs. 7. Region: The geographic region of the patient, helping to understand regional disparities in healthcare expenditure. 8. Charges: The medical charges incurred by the patient, serving as the target variable for analysis and predictions.

    Whether you're aiming to uncover patterns in medical billing, predict future healthcare costs, or explore the relationships between different variables and charges, our Medical Cost Dataset provides a robust foundation for your research. Researchers can utilize this dataset to develop data-driven models that enhance the efficiency of healthcare resource allocation, insurers can refine pricing strategies, and policymakers can make informed decisions to improve the overall healthcare system.

    Unlock the potential of healthcare data with our comprehensive Medical Cost Dataset. Gain insights, make informed decisions, and contribute to the advancement of healthcare economics and policy. Start your analysis today and pave the way for a healthier future.

  3. h

    medical-insurance-charges-dataset

    • huggingface.co
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F (2025). medical-insurance-charges-dataset [Dataset]. https://huggingface.co/datasets/affnanation/medical-insurance-charges-dataset
    Explore at:
    Dataset updated
    Jul 15, 2025
    Authors
    F
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset: Medical Insurance Cost

    This is the dataset used to train and evaluate the health insurance cost prediction model for the RiskGuard project. The main code repository can be found on GitHub.

      Dataset Description
    

    This dataset originates from Kaggle (Medical Cost Personal Datasets) and contains demographic and personal attributes of insurance customers. It is used to predict individual medical costs.

      Data Columns
    

    age: Age of the primary beneficiary… See the full description on the dataset page: https://huggingface.co/datasets/affnanation/medical-insurance-charges-dataset.

  4. A

    ‘Medical Insurance dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Medical Insurance dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-medical-insurance-dataset-b194/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Medical Insurance dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rajgupta2019/medical-insurance-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    People are always confused about their medical insurance and don't know the cost of insurance at different ages and conditions. This data is useful for these people and is useful to make predictions of the insurance cost they will have to pay.

    Content

    The data provider is unknown and all credit goes to the person. Data may not be sufficient for practical purpose and is solely for education and practice.

    Acknowledgements

    Data collection is one thing and data cleaning and preprocessing is other. The resources on YouTube is enough to learn these basics.

    Inspiration

    The KAGGLE community is very inspiring and is the best way to learn everything we need to know in Data Science and I love it.

    --- Original source retains full ownership of the source dataset ---

  5. Sample Insurance Claim Prediction Dataset

    • kaggle.com
    Updated Jun 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eason (2018). Sample Insurance Claim Prediction Dataset [Dataset]. https://www.kaggle.com/easonlai/sample-insurance-claim-prediction-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 4, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Eason
    Description

    Content

    This is "Sample Insurance Claim Prediction Dataset" which based on "[Medical Cost Personal Datasets][1]" to update sample value on top.

    age : age of policyholder sex: gender of policy holder (female=0, male=1) bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 25 steps: average walking steps per day of policyholder children: number of children / dependents of policyholder smoker: smoking state of policyholder (non-smoke=0;smoker=1) region: the residential area of policyholder in the US (northeast=0, northwest=1, southeast=2, southwest=3) charges: individual medical costs billed by health insurance insuranceclaim: yes=1, no=0

  6. A

    ‘ Medical Cost Personal Datasets’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘ Medical Cost Personal Datasets’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-medical-cost-personal-datasets-703f/f489ee08/?iid=012-673&v=presentation
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘ Medical Cost Personal Datasets’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mirichoi0218/insurance on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    Machine Learning with R by Brett Lantz is a book that provides an introduction to machine learning using R. As far as I can tell, Packt Publishing does not make its datasets available online unless you buy the book and create a user account which can be a problem if you are checking the book out from the library or borrowing the book from a friend. All of these datasets are in the public domain but simply needed some cleaning up and recoding to match the format in the book.

    Content

    Columns - age: age of primary beneficiary

    • sex: insurance contractor gender, female, male

    • bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9

    • children: Number of children covered by health insurance / Number of dependents

    • smoker: Smoking

    • region: the beneficiary's residential area in the US, northeast, southeast, southwest, northwest.

    • charges: Individual medical costs billed by health insurance

    Acknowledgements

    The dataset is available on GitHub here.

    Inspiration

    Can you accurately predict insurance costs?

    --- Original source retains full ownership of the source dataset ---

  7. Data from: Health Insurance Cost Prediction

    • kaggle.com
    Updated Mar 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anne Aguirre (2018). Health Insurance Cost Prediction [Dataset]. https://www.kaggle.com/annetxu/health-insurance-cost-prediction/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 11, 2018
    Dataset provided by
    Kaggle
    Authors
    Anne Aguirre
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Anne Aguirre

    Released under CC0: Public Domain

    Contents

  8. A

    ‘Medical Insurance Premium Prediction’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Medical Insurance Premium Prediction’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-medical-insurance-premium-prediction-1dd9/cebbbb3b/?iid=008-678&v=presentation
    Explore at:
    Dataset updated
    Aug 5, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Medical Insurance Premium Prediction’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tejashvi14/medical-insurance-premium-prediction on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    A Medical Insurance Company Has Released Data For Almost 1000 Customers. Create A Model That Predicts The Yearly Medical Cover Cost. The Data Is Voluntarily Given By Customers.

    Content

    The Dataset Contains Health Related Parameters Of The Customers. Use Them To Build A Model And Also Perform EDA On The Same. The Premium Price Is In INR(₹) Currency And Showcases Prices For A Whole Year.

    Inspiration

    Help Solve A Crucial Finance Problem That Would Potentially Impact Many People And Would Help Them Make Better Decisions. Don't Forget To Submit Your EDAs And Models In The Task Section. These Will Be Keenly Reviewed Hope You Enjoy Working On The Data. note- This is a dummy dataset used for teaching and training purposes. It is free to use, Image Credits-Unsplash

    --- Original source retains full ownership of the source dataset ---

  9. Health Insurance Dataset

    • kaggle.com
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamadreza Momeni (2025). Health Insurance Dataset [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/health-insurance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2025
    Dataset provided by
    Kaggle
    Authors
    Mohamadreza Momeni
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Medical Insurance Expenses & Premium Dataset

    This dataset captures demographic and financial information related to medical insurance policyholders. It includes key features such as age, gender, BMI, number of children, discount eligibility status, and the geographic region of the insured. The dataset also provides the actual medical expenses incurred (expenses) and the insurance premium charged (premium).

    The purpose of this dataset is to support research and development of machine learning models for predicting healthcare costs, optimizing pricing strategies, and understanding factors that influence insurance expenses and premiums.

    Columns

    age: Age of the policyholder

    gender: Gender (male/female)

    bmi: Body Mass Index

    children: Number of children covered by the insurance

    discount_eligibility: Whether the policyholder is eligible for a discount (yes/no)

    region: Geographic region (e.g., southeast, northwest)

    expenses: Actual medical costs incurred by the policyholder (Target number 1)

    premium: Insurance premium charged (Target number 2)

    Example Use Cases

    Predicting insurance expenses for new applicants

    Analyzing which demographic factors contribute most to higher premiums

    Exploring correlations between BMI, age, and healthcare costs

    Developing regression and classification models for pricing optimization

  10. A

    ‘Health Insurance Coverage’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Health Insurance Coverage’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-health-insurance-coverage-1c87/88f5e0a9/?iid=002-220&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Health Insurance Coverage’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hhs/health-insurance on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    The Affordable Care Act (ACA) is the name for the comprehensive health care reform law and its amendments which addresses health insurance coverage, health care costs, and preventive care. The law was enacted in two parts: The Patient Protection and Affordable Care Act was signed into law on March 23, 2010 by President Barack Obama and was amended by the Health Care and Education Reconciliation Act on March 30, 2010.

    Content

    This dataset provides health insurance coverage data for each state and the nation as a whole, including variables such as the uninsured rates before and after Obamacare, estimates of individuals covered by employer and marketplace healthcare plans, and enrollment in Medicare and Medicaid programs.

    Acknowledgements

    The health insurance coverage data was compiled from the US Department of Health and Human Services and US Census Bureau.

    Inspiration

    How has the Affordable Care Act changed the rate of citizens with health insurance coverage? Which states observed the greatest decline in their uninsured rate? Did those states expand Medicaid program coverage and/or implement a health insurance marketplace? What do you predict will happen to the nationwide uninsured rate in the next five years?

    --- Original source retains full ownership of the source dataset ---

  11. Health insurance yearly cost data

    • kaggle.com
    Updated Aug 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Fadnavis (2020). Health insurance yearly cost data [Dataset]. https://www.kaggle.com/davidfadnavis/health-insurance-yearly-cost-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    David Fadnavis
    Description

    Dataset

    This dataset was created by David Fadnavis

    Contents

  12. Medical_Insurance_Cost_Dataset

    • kaggle.com
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sakshi Singh (2024). Medical_Insurance_Cost_Dataset [Dataset]. https://www.kaggle.com/datasets/sakshisinghssg/medical-insurance-cost-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sakshi Singh
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Sakshi Singh

    Released under Apache 2.0

    Contents

  13. A

    ‘US Health Insurance Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘US Health Insurance Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-us-health-insurance-dataset-8b56/068994aa/?iid=012-655&v=presentation
    Explore at:
    Dataset updated
    Nov 15, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘US Health Insurance Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/teertha/ushealthinsurancedataset on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    The venerable insurance industry is no stranger to data driven decision making. Yet in today's rapidly transforming digital landscape, Insurance is struggling to adapt and benefit from new technologies compared to other industries, even within the BFSI sphere (compared to the Banking sector for example.) Extremely complex underwriting rule-sets that are radically different in different product lines, many non-KYC environments with a lack of centralized customer information base, complex relationship with consumers in traditional risk underwriting where sometimes customer centricity runs reverse to business profit, inertia of regulatory compliance - are some of the unique challenges faced by Insurance Business.

    Despite this, emergent technologies like AI and Block Chain have brought a radical change in Insurance, and Data Analytics sits at the core of this transformation. We can identify 4 key factors behind the emergence of Analytics as a crucial part of InsurTech:

    • Big Data: The explosion of unstructured data in the form of images, videos, text, emails, social media
    • AI: The recent advances in Machine Learning and Deep Learning that can enable businesses to gain insight, do predictive analytics and build cost and time - efficient innovative solutions
    • Real time Processing: Ability of real time information processing through various data feeds (for ex. social media, news)
    • Increased Computing Power: a complex ecosystem of new analytics vendors and solutions that enable carriers to combine data sources, external insights, and advanced modeling techniques in order to glean insights that were not possible before.

    This dataset can be helpful in a simple yet illuminating study in understanding the risk underwriting in Health Insurance, the interplay of various attributes of the insured and see how they affect the insurance premium.

    Content

    This dataset contains 1338 rows of insured data, where the Insurance charges are given against the following attributes of the insured: Age, Sex, BMI, Number of Children, Smoker and Region. There are no missing or undefined values in the dataset.

    Inspiration

    This relatively simple dataset should be an excellent starting point for EDA, Statistical Analysis and Hypothesis testing and training Linear Regression models for predicting Insurance Premium Charges.

    Proposed Tasks: - Exploratory Data Analytics - Statistical hypothesis testing - Statistical Modeling - Linear Regression

    --- Original source retains full ownership of the source dataset ---

  14. A

    ‘Insurance Premium Prediction’ analyzed by Analyst-2

    • analyst-2.ai
    Updated May 7, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2017). ‘Insurance Premium Prediction’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-insurance-premium-prediction-5c29/0104c9d1/?iid=005-165&v=presentation
    Explore at:
    Dataset updated
    May 7, 2017
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Insurance Premium Prediction’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/noordeen/insurance-premium-prediction on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    The insurance.csv dataset contains 1338 observations (rows) and 7 features (columns). The dataset contains 4 numerical features (age, bmi, children and expenses) and 3 nominal features (sex, smoker and region) that were converted into factors with numerical value designated for each level.

    Acknowledgements

    Insurance.csv file is obtained from the Machine Learning course website (Spring 2017) from Professor Eric Suess at http://www.sci.csueastbay.edu/~esuess/stat6620/#week-6.

    Inspiration

    The purposes of this exercise to look into different features to observe their relationship, and plot a multiple linear regression based on several features of individual such as age, physical/family condition and location against their existing medical expense to be used for predicting future medical expenses of individuals that help medical insurance to make decision on charging the premium.

    --- Original source retains full ownership of the source dataset ---

  15. Insurance Policy Assets, Liabilities, and Premiums

    • kaggle.com
    Updated Jan 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Insurance Policy Assets, Liabilities, and Premiums [Dataset]. https://www.kaggle.com/datasets/thedevastator/ny-insurance-policy-assets-liabilities-and-premi/versions/2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    NY Insurance Policy Assets, Liabilities, and Premiums Annually

    Investigating the Impact of Financial Health on Health Insurance Costs

    By State of New York [source]

    About this dataset

    This dataset tracks health insurance premiums written in New York annually since 2004. It provides vital insight into the amount of money and risk taken on by insurance companies in the state: including what types of insurers are writing policies, how much they are taking on in assets and liabilities, and how this has shifted over time. This data will be invaluable to those looking to understand large scale trends in terms of the health insurance industry. The data has been updated as recently as 2021, so it provides a comprehensive picture of changes year-over-year spanning nearly two decades

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains vital information regarding health insurance premiums, assets and liabilities related to policies written in New York annually. It is designed to provide key insights into the performance of insurance companies in New York state.

    The data consists of Type of Insurer, Company Name, Year, Assets, Liabilities and Premium Written for each policy written in every year since 2009. This data can be used to gain greater insight into the performance of certain companies within this industry over time as well as creating benchmarked comparison metrics against other companies within this market space.

    For individual or team exploration projects – you may want to compare one company’s yearly assets/liabilities or premiums against the average value for that same period in order to identify high or low performing periods or take a look at how some variables changed across a 5 year (or wider) timescale e.g compare how did assets/liabilites changed over the duration of 5 years?

    By utilizing basic data visualizations like scatterplots and bar graphs we can start gaining more insights from our analysis by looking at potential correlations between variables such as: Are premium prices related to their assets? Does company size have an impact on the premium price? Have liabilities remained constant compared with past years?

    Administrators in management roles could also use this dataset to track yearly changes within their own companys results- such as tracking existing trends over longer periods with pay attention for changes which require further investigation/ research as necessary .

    All in all this data set is a great tool for students , researchers & analysts alike!

    Research Ideas

    • Establishing a baseline of average health insurance premiums in New York by year across different insurers.
    • Comparing insurance company assets and liabilities with their premium-written to provide an understanding of how profitable they are in the New York market.
    • Tracking the growth and success of health insurers in the New York over time to understand changes in industry trends or policy standards

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: health-insurance-premiums-on-policies-written-in-new-york-annually-1.csv | Column name | Description | |:--------------------|:--------------------------------------------------------------------------------------------------------------------------------| | Type of Insurer | This column indicates the type of insurer that wrote the policy. (String) | | Company Name | This column indicates the name of the company that wrote the policy. (String) | | Year | This column indicates the year that the policy was written in. (Integer) | | Assets | This column indicates the total assets of the company that wrote the policy. (Integer) | | Liabilities | This column indicates the total liabilities of the company that wrote the policy. (Integer) | | Premium Written | This column indicates the total amount paid by an individual or organization for a given product or service annually. (Integer) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit State of New York.

  16. a

    ai training dataset Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). ai training dataset Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-training-dataset-1502524
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    May 10, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    CA
    Variables measured
    Market Size
    Description

    The AI training dataset market is experiencing robust growth, driven by the increasing adoption of artificial intelligence across diverse sectors. The market's expansion is fueled by the need for high-quality, labeled data to train sophisticated AI models capable of handling complex tasks. Applications span various industries, including IT, automotive, healthcare, BFSI (Banking, Financial Services, and Insurance), and retail & e-commerce. The demand for diverse data types—text, image/video, and audio—further fuels market expansion. While precise market sizing is unavailable, considering the rapid growth of AI and the significant investment in data annotation services, a reasonable estimate places the 2025 market value at approximately $15 billion, with a compound annual growth rate (CAGR) of 25% projected through 2033. This growth reflects a rising awareness of the pivotal role high-quality datasets play in achieving accurate and reliable AI outcomes. Key restraining factors include the high cost of data acquisition and annotation, along with concerns around data privacy and security. However, these challenges are being addressed through advancements in automation and the emergence of innovative data synthesis techniques. The competitive landscape is characterized by a mix of established technology giants like Google, Amazon, and Microsoft, alongside specialized data annotation companies like Appen and Lionbridge. The market is expected to see continued consolidation as larger players acquire smaller firms to expand their data offerings and strengthen their market position. Regional variations exist, with North America and Europe currently dominating the market share, although regions like Asia-Pacific are projected to experience significant growth due to increasing AI adoption and investments.

  17. Insurance Premium Data

    • kaggle.com
    Updated Sep 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simran Jain (2020). Insurance Premium Data [Dataset]. https://www.kaggle.com/datasets/simranjain17/insurance/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 9, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Simran Jain
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This Dataset is something I found online when I wanted to practice regression models. It is an openly available online dataset at multiple places. Though I do not know the exact origin and collection methodology of the data, I would recommend this dataset to everybody who is just beginning their journey in Data science.

  18. Hospital Management Dataset

    • kaggle.com
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanak Baghel (2025). Hospital Management Dataset [Dataset]. https://www.kaggle.com/datasets/kanakbaghel/hospital-management-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kanak Baghel
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This is a structured, multi-table dataset designed to simulate a hospital management system. It is ideal for practicing data analysis, SQL, machine learning, and healthcare analytics.

    Dataset Overview

    This dataset includes five CSV files:

    1. patients.csv – Patient demographics, contact details, registration info, and insurance data

    2. doctors.csv – Doctor profiles with specializations, experience, and contact information

    3. appointments.csv – Appointment dates, times, visit reasons, and statuses

    4. treatments.csv – Treatment types, descriptions, dates, and associated costs

    5. billing.csv – Billing amounts, payment methods, and status linked to treatments

    📁 Files & Column Descriptions

    ** patients.csv**

    Contains patient demographic and registration details.

    Column Description

    patient_id -> Unique ID for each patient first_name -> Patient's first name last_name -> Patient's last name gender -> Gender (M/F) date_of_birth -> Date of birth contact_number -> Phone number address -> Address of the patient registration_date -> Date of first registration at the hospital insurance_provider -> Insurance company name insurance_number -> Policy number email -> Email address

    ** doctors.csv**

    Details about the doctors working in the hospital.

    Column Description

    doctor_id -> Unique ID for each doctor first_name -> Doctor's first name last_name -> Doctor's last name specialization -> Medical field of expertise phone_number -> Contact number years_experience -> Total years of experience hospital_branch -> Branch of hospital where doctor is based email -> Official email address

    appointments.csv

    Records of scheduled and completed patient appointments.

    Column Description

    appointment_id -> Unique appointment ID patient_id -> ID of the patient doctor_id -> ID of the attending doctor appointment_date -> Date of the appointment appointment_time -> Time of the appointment reason_for_visit -> Purpose of visit (e.g., checkup) status -> Status (Scheduled, Completed, Cancelled)

    treatments.csv

    Information about the treatments given during appointments.

    Column Description

    treatment_id -> Unique ID for each treatment appointment_id -> Associated appointment ID treatment_type -> Type of treatment (e.g., MRI, X-ray) description -> Notes or procedure details cost -> Cost of treatment treatment_date -> Date when treatment was given

    ** billing.csv**

    Billing and payment details for treatments.

    Column Description

    bill_id -> Unique billing ID patient_id -> ID of the billed patient treatment_id -> ID of the related treatment bill_date -> Date of billing amount -> Total amount billed payment_method -> Mode of payment (Cash, Card, Insurance) payment_status -> Status of payment (Paid, Pending, Failed)

    Possible Use Cases

    SQL queries and relational database design

    Exploratory data analysis (EDA) and dashboarding

    Machine learning projects (e.g., cost prediction, no-show analysis)

    Feature engineering and data cleaning practice

    End-to-end healthcare analytics workflows

    Recommended Tools & Resources

    SQL (joins, filters, window functions)

    Pandas and Matplotlib/Seaborn for EDA

    Scikit-learn for ML models

    Pandas Profiling for automated EDA

    Plotly for interactive visualizations

    Please Note that :

    All data is synthetically generated for educational and project use. No real patient information is included.

    If you find this dataset helpful, consider upvoting or sharing your insights by creating a Kaggle notebook.

  19. US Health Insurance Dataset

    • kaggle.com
    Updated Feb 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anirban Datta (2020). US Health Insurance Dataset [Dataset]. https://www.kaggle.com/teertha/ushealthinsurancedataset/kernels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anirban Datta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The venerable insurance industry is no stranger to data driven decision making. Yet in today's rapidly transforming digital landscape, Insurance is struggling to adapt and benefit from new technologies compared to other industries, even within the BFSI sphere (compared to the Banking sector for example.) Extremely complex underwriting rule-sets that are radically different in different product lines, many non-KYC environments with a lack of centralized customer information base, complex relationship with consumers in traditional risk underwriting where sometimes customer centricity runs reverse to business profit, inertia of regulatory compliance - are some of the unique challenges faced by Insurance Business.

    Despite this, emergent technologies like AI and Block Chain have brought a radical change in Insurance, and Data Analytics sits at the core of this transformation. We can identify 4 key factors behind the emergence of Analytics as a crucial part of InsurTech:

    • Big Data: The explosion of unstructured data in the form of images, videos, text, emails, social media
    • AI: The recent advances in Machine Learning and Deep Learning that can enable businesses to gain insight, do predictive analytics and build cost and time - efficient innovative solutions
    • Real time Processing: Ability of real time information processing through various data feeds (for ex. social media, news)
    • Increased Computing Power: a complex ecosystem of new analytics vendors and solutions that enable carriers to combine data sources, external insights, and advanced modeling techniques in order to glean insights that were not possible before.

    This dataset can be helpful in a simple yet illuminating study in understanding the risk underwriting in Health Insurance, the interplay of various attributes of the insured and see how they affect the insurance premium.

    Content

    This dataset contains 1338 rows of insured data, where the Insurance charges are given against the following attributes of the insured: Age, Sex, BMI, Number of Children, Smoker and Region. There are no missing or undefined values in the dataset.

    Inspiration

    This relatively simple dataset should be an excellent starting point for EDA, Statistical Analysis and Hypothesis testing and training Linear Regression models for predicting Insurance Premium Charges.

    Proposed Tasks: - Exploratory Data Analytics - Statistical hypothesis testing - Statistical Modeling - Linear Regression

  20. Learning from Imbalanced Insurance Data

    • kaggle.com
    Updated Nov 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Möbius (2020). Learning from Imbalanced Insurance Data [Dataset]. https://www.kaggle.com/arashnic/imbalanced-data-practice/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Möbius
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Insurance companies that sell life, health, and property and casualty insurance are using machine learning (ML) to drive improvements in customer service, fraud detection, and operational efficiency. The data provided by an Insurance company which is not excluded from other companies to getting advantage of ML. This company provides Health Insurance to its customers. We can build a model to predict whether the policyholders (customers) from past year will also be interested in Vehicle Insurance provided by the company.

    An insurance policy is an arrangement by which a company undertakes to provide a guarantee of compensation for specified loss, damage, illness, or death in return for the payment of a specified premium. A premium is a sum of money that the customer needs to pay regularly to an insurance company for this guarantee.

    For example, you may pay a premium of Rs. 5000 each year for a health insurance cover of Rs. 200,000/- so that if, God forbid, you fall ill and need to be hospitalized in that year, the insurance provider company will bear the cost of hospitalization etc. for up to Rs. 200,000. Now if you are wondering how can company bear such high hospitalization cost when it charges a premium of only Rs. 5000/-, that is where the concept of probabilities comes in picture. For example, like you, there may be 100 customers who would be paying a premium of Rs. 5000 every year, but only a few of them (say 2-3) would get hospitalized that year and not everyone. This way everyone shares the risk of everyone else.

    Just like medical insurance, there is vehicle insurance where every year customer needs to pay a premium of certain amount to insurance provider company so that in case of unfortunate accident by the vehicle, the insurance provider company will provide a compensation (called ‘sum assured’) to the customer.

    Content

    Building a model to predict whether a customer would be interested in Vehicle Insurance is extremely helpful for the company because it can then accordingly plan its communication strategy to reach out to those customers and optimize its business model and revenue.

    We have information about: - Demographics (gender, age, region code type), - Vehicles (Vehicle Age, Damage), - Policy (Premium, sourcing channel) etc.

    Update: Test data target values has been added. To evaluate your models more precisely you can use: https://www.kaggle.com/arashnic/answer

    #
    #

    Moreover the supplemental goal is to practice learning imbalanced data and verify how the results can help in real operational process. The Response feature (target) is highly imbalanced.

    #

    0: 319594 1: 62531 Name: Response, dtype: int64

    #
    Practicing some techniques like resampling is useful to verify impacts on validation results and confusion matrix. #
    https://miro.medium.com/max/640/1*KxFmI15rxhvKRVl-febp-Q.png"> figure. Under-sampling: Tomek links # #

    Starter Kernel(s)

    Inspiration

    Predict whether a customer would be interested in Vehicle Insurance

    #
    #

    MORE DATASETs ...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
US Department of Health and Human Services (2017). Health Insurance Marketplace [Dataset]. https://www.kaggle.com/datasets/hhs/health-insurance-marketplace
Organization logo

Health Insurance Marketplace

Explore health and dental plans data in the US Health Insurance Marketplace

Explore at:
zip(868821924 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Authors
US Department of Health and Human Services
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.

median plan premiums

Exploration Ideas

To help get you started, here are some data exploration ideas:

  • How do plan rates and benefits vary across states?
  • How do plan benefits relate to plan rates?
  • How do plan rates vary by age?
  • How do plans vary across insurance network providers?

See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!

Data Description

This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.

Here, we've processed the data to facilitate analytics. This processed version has three components:

1. Original versions of the data

The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.

2. Combined CSV files that contain

In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:

  • BenefitsCostSharing.csv
  • BusinessRules.csv
  • Network.csv
  • PlanAttributes.csv
  • Rate.csv
  • ServiceArea.csv

Additionally, there are two CSV files that facilitate joining data across years:

  • Crosswalk2015.csv - joining 2014 and 2015 data
  • Crosswalk2016.csv - joining 2015 and 2016 data

3. SQLite database

The "database.sqlite" file contains tables corresponding to each of the processed CSV files.

The code to create the processed version of this data is available on GitHub.

Search
Clear search
Close search
Google apps
Main menu