12 datasets found
  1. Health Insurance Marketplace

    • kaggle.com
    zip
    Updated May 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Department of Health and Human Services (2017). Health Insurance Marketplace [Dataset]. https://www.kaggle.com/datasets/hhs/health-insurance-marketplace
    Explore at:
    zip(868821924 bytes)Available download formats
    Dataset updated
    May 1, 2017
    Dataset provided by
    United States Department of Health and Human Serviceshttp://www.hhs.gov/
    Authors
    US Department of Health and Human Services
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.

    median plan premiums

    Exploration Ideas

    To help get you started, here are some data exploration ideas:

    • How do plan rates and benefits vary across states?
    • How do plan benefits relate to plan rates?
    • How do plan rates vary by age?
    • How do plans vary across insurance network providers?

    See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!

    Data Description

    This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.

    Here, we've processed the data to facilitate analytics. This processed version has three components:

    1. Original versions of the data

    The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.

    2. Combined CSV files that contain

    In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:

    • BenefitsCostSharing.csv
    • BusinessRules.csv
    • Network.csv
    • PlanAttributes.csv
    • Rate.csv
    • ServiceArea.csv

    Additionally, there are two CSV files that facilitate joining data across years:

    • Crosswalk2015.csv - joining 2014 and 2015 data
    • Crosswalk2016.csv - joining 2015 and 2016 data

    3. SQLite database

    The "database.sqlite" file contains tables corresponding to each of the processed CSV files.

    The code to create the processed version of this data is available on GitHub.

  2. h

    medical-insurance-charges-dataset

    • huggingface.co
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F (2025). medical-insurance-charges-dataset [Dataset]. https://huggingface.co/datasets/affnanation/medical-insurance-charges-dataset
    Explore at:
    Dataset updated
    Jul 15, 2025
    Authors
    F
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset: Medical Insurance Cost

    This is the dataset used to train and evaluate the health insurance cost prediction model for the RiskGuard project. The main code repository can be found on GitHub.

      Dataset Description
    

    This dataset originates from Kaggle (Medical Cost Personal Datasets) and contains demographic and personal attributes of insurance customers. It is used to predict individual medical costs.

      Data Columns
    

    age: Age of the primary beneficiary… See the full description on the dataset page: https://huggingface.co/datasets/affnanation/medical-insurance-charges-dataset.

  3. A

    ‘Medical Insurance Premium Prediction’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Medical Insurance Premium Prediction’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-medical-insurance-premium-prediction-5cfe/827b15fc/?iid=021-110&v=presentation
    Explore at:
    Dataset updated
    Aug 4, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Medical Insurance Premium Prediction’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tejashvi14/medical-insurance-premium-prediction on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    A Medical Insurance Company Has Released Data For Almost 1000 Customers. Create A Model That Predicts The Yearly Medical Cover Cost. The Data Is Voluntarily Given By Customers.

    Content

    The Dataset Contains Health Related Parameters Of The Customers. Use Them To Build A Model And Also Perform EDA On The Same. The Premium Price Is In INR(₹) Currency And Showcases Prices For A Whole Year.

    Inspiration

    Help Solve A Crucial Finance Problem That Would Potentially Impact Many People And Would Help Them Make Better Decisions. Don't Forget To Submit Your EDAs And Models In The Task Section. These Will Be Keenly Reviewed Hope You Enjoy Working On The Data. note- This is a dummy dataset used for teaching and training purposes. It is free to use, Image Credits-Unsplash

    --- Original source retains full ownership of the source dataset ---

  4. Insurance

    • kaggle.com
    Updated Jun 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    G DEEPAK REDDY (2022). Insurance [Dataset]. https://www.kaggle.com/datasets/gdeepakreddy/insurance
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    G DEEPAK REDDY
    Description

    Business Problem: We all know that Health care is very important domain in the market. It is directly linked with the life of the individual; hence we have to be always be proactive in this particular domain. Money plays a major role in this domain, because sometime treatment becomes super costly and if any individual is not covered under the insurance then it will become a pretty tough financial situation for that individual. The companies in the medical insurance also want to reduce their risk by optimizing the insurance cost, because we all know a healthy body is in the hand of the individual only. If individual eat healthy and do proper exercise the chance of getting ill is drastically reduced. Goal & Objective: The objective of this exercise is to build a model, using data that provide the optimum insurance cost for an individual. You have to use the health and habit related parameters for the estimated cost of insurance

    Review Parameters Review points 1) Introduction of the business problem a) Defining problem statement
    b) Need of the study/project
    c) Understanding business/social opportunity

    2)Data Report
    a) Understanding how data was collected in terms of time, frequency and methodology
    b) Visual inspection of data (rows, columns, descriptive details)
    c) Understanding of attributes (variable info, renaming if required)

    3) Exploratory data analysis
    a) Univariate analysis (distribution and spread for every continuous attribute, distribution of data in categories for categorical ones)
    b) Bivariate analysis (relationship between different variables , correlations)
    a) Removal of unwanted variables (if applicable)
    b) Missing Value treatment (if applicable)
    d) Outlier treatment (if required)
    e) Variable transformation (if applicable)
    f) Addition of new variables (if required)

    4) Business insights from EDA a) Is the data unbalanced? If so, what can be done? Please explain in the context of the business
    b) Any business insights using clustering (if applicable)
    c) Any other business insights

  5. Medical_Insurance_Cost_Dataset

    • kaggle.com
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sakshi Singh (2024). Medical_Insurance_Cost_Dataset [Dataset]. https://www.kaggle.com/datasets/sakshisinghssg/medical-insurance-cost-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sakshi Singh
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Sakshi Singh

    Released under Apache 2.0

    Contents

  6. Health Insurance Marketplaces

    • kaggle.com
    Updated Jan 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Health Insurance Marketplaces [Dataset]. https://www.kaggle.com/datasets/thedevastator/health-insurance-marketplaces/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Description

    Health Insurance Marketplaces

    Rates, Benefits, Coverage and Networks

    By Data Society [source]

    About this dataset

    Do you want to explore the complexities of Health Insurance Marketplace and uncover insights into plan rates, benefits, and networks? Look no further! With this dataset from the Centers for Medicare & Medicaid Services (CMS), you can investigate trends in plan rates, access coverage across states and zip codes, compare metal level plans (across years), as well as analyze benefit information all in one place.

    We’ve provided six CSV files containing combined data from across all years: BenefitsCostSharing.csv provides details on benefits, BusinessRules.csv provides details about premium payment requirements for a plan or set of plans, Network.csv offers details about health plans’ networks of providers who offer services at different cost levels to members enrolled in a given plan or set of plans; PlanAttributes.csv gives attributes like age off dates for various plans; Rate.csv delivers information on rate changes; ServiceArea.csv reveals demographic characteristics related to each service area associated with a specific issuer and two CSV files that join data across years (Crosswalk2015 & Crosswalk2016).

    So come on board and use your creativity to unlock the mysteries behind changes in benefits in relation to costs while exploring network providers within different regions!!!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains information about the health insurance plans offered in the US Health Insurance Marketplace. It includes data on plan benefits, cost-sharing, networks, rates and service areas for different states. The data can be used to compare and analyze plan characteristics across different states and ages which will help guide users decision making when purchasing a health insurance plan.

    To begin using the dataset, you should start by looking at the columns available. These include State, Dental Plan, Multistate Plan (2015 & 2016), Metal Level (2015 & 2016), Child/Adult Only (2015 & 2016), FIPS Code, Zip Code Crosswalk Level, Reason for Crosswalk, Multistate Plan Ageoff (2016 & 2015) and MetalLevel Ageoff (2016 & 2015). These columns provide important information on each plan that can be used to compare them across states or between years.

    Using this data you can explore several interesting questions such as: How do benefit levels vary among states? Are there any differences in network providers between states? What factors influence plan rates?

    In order to answer these questions you should join together relevant tables from across years using Crosswalk 2015/2016 CSV files then organize your data accordingly so that it is easier to visualize differences in features between plans sold across different states or years. Once the information is organized it might be helpful to use visualizations such as line graphs or bar charts to view comparison between feature values of two plans versus one another more clearly in order differentiate variations of plans among Consumers.

    By doing this you can gain a better understanding of how certain factors may affect rate changes over time or how certain benefit levels might differ by state which will allow Consumers make an informed choice when selecting their next health insurance plan

    Research Ideas

    • Analyzing the effectiveness of different plan benefits and how they affect premiums to determine a fair price point for different types of healthcare plans.
    • Examining the variation in rates, benefits and coverage by state or zip code to identify potential trends or disparities in access to quality health care services across regions.
    • Developing an algorithm that can predict premium prices based on certain factors such as age groups, type of plan (metal levels), multistate coverage, etc., to help consumers more easily understand the true cost of their health insurance plans before committing to purchase them

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit -...

  7. Health insurance price

    • kaggle.com
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amar Ahmed Hamed (2024). Health insurance price [Dataset]. https://www.kaggle.com/datasets/amarahmedhamed/health-insurance-price/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Amar Ahmed Hamed
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Amar Ahmed Hamed

    Released under Apache 2.0

    Contents

  8. f

    Health Insurance Dataset

    • figshare.com
    csv
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prakash M C (2025). Health Insurance Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28571408.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    figshare
    Authors
    Prakash M C
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains expense and premium details related to health insurance.

  9. Medical Policy Premium Dataset

    • kaggle.com
    Updated Sep 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sunil Sanjay Hule (2021). Medical Policy Premium Dataset [Dataset]. https://www.kaggle.com/sunilhule/medical-policy-premium-dataset/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 22, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sunil Sanjay Hule
    Description

    Context

    The given data set contains a user's medical history in the form of whether they have a specific condition or not, their age, height, weight, etc along with the premium they have to pay in INR for insurance.

    Content

    This tabular data contains 11 columns regarding patient's medical records and current health conditions.

  10. Health Insurance Lead Prediction

    • kaggle.com
    zip
    Updated Feb 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavyajot Malhotra (2021). Health Insurance Lead Prediction [Dataset]. https://www.kaggle.com/bhavyajotmalhotra/jobathon-health-insurance-prediction
    Explore at:
    zip(1129580 bytes)Available download formats
    Dataset updated
    Feb 26, 2021
    Authors
    Bhavyajot Malhotra
    Description

    Your Client FinMan is a financial services company that provides various financial services like loan, investment funds, insurance etc. to its customers. FinMan wishes to cross-sell health insurance to the existing customers who may or may not hold insurance policies with the company. The company recommend health insurance to it's customers based on their profile once these customers land on the website. Customers might browse the recommended health insurance policy and consequently fill up a form to apply. When these customers fill-up the form, their Response towards the policy is considered positive and they are classified as a lead.

    Once these leads are acquired, the sales advisors approach them to convert and thus the company can sell proposed health insurance to these leads in a more efficient manner.

    Now the company needs your help in building a model to predict whether the person will be interested in their proposed Health plan/policy given the information about:

    Demographics (city, age, region etc.) Information regarding holding policies of the customer Recommended Policy Information

    Data Dictionary Train Data Variable Definition ID Unique Identifier for a row City_Code Code for the City of the customers Region_Code Code for the Region of the customers Accomodation_Type Customer Owns or Rents the house Reco_Insurance_Type Joint or Individual type for the recommended insurance
    Upper_Age Maximum age of the customer Lower _Age Minimum age of the customer Is_Spouse If the customers are married to each other (in case of joint insurance) Health_Indicator Encoded values for health of the customer Holding_Policy_Duration Duration (in years) of holding policy (a policy that customer has already subscribed to with the company) Holding_Policy_Type Type of holding policy Reco_Policy_Cat Encoded value for recommended health insurance Reco_Policy_Premium Annual Premium (INR) for the recommended health insurance Response (Target) 0 : Customer did not show interest in the recommended policy, 1 : Customer showed interest in the recommended policy

    Test Data Variable Definition ID Unique Identifier for a row City_Code Code for the City of the customers Region_Code Code for the Region of the customers Accomodation_Type Customer Owns or Rents the house Reco_Insurance_Type Joint or Individual type for the recommended insurance Upper_Age Maximum age of the customer Lower _Age Minimum age of the customer Is_Spouse If the customers are married to each other (in case of joint insurance) Health_Indicator Encoded values for health of the customer Holding_Policy_Duration Duration (in years) of holding policy (a policy that customer has already subscribed to with the company) Holding_Policy_Type Type of holding policy Reco_Policy_Cat Encoded value for recommended health insurance Reco_Policy_Premium Annual Premium (INR) for the recommended health insurance

    Variable Definition ID Unique Identifier for a row Response (Target) Probability of Customer showing interest (class 1)

  11. AV: JantaHackathon

    • kaggle.com
    Updated Sep 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kunal Bambardekar (2020). AV: JantaHackathon [Dataset]. https://www.kaggle.com/kbambardekar/av-jantahackathon/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kunal Bambardekar
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Will you take vehicle insurance?

    An insurance policy is an arrangement by which a company undertakes to provide a guarantee of compensation for specified loss, damage, illness, or death in return for the payment of a specified premium. A premium is a sum of money that the customer needs to pay regularly to an insurance company for this guarantee.

    For example, you may pay a premium of Rs. 5000 each year for a health insurance cover of Rs. 200,000/- so that if, God forbid, you fall ill and need to be hospitalised in that year, the insurance provider company will bear the cost of hospitalisation etc. for upto Rs. 200,000. Now if you are wondering how can company bear such high hospitalisation cost when it charges a premium of only Rs. 5000/-, that is where the concept of probabilities comes in picture. For example, like you, there may be 100 customers who would be paying a premium of Rs. 5000 every year, but only a few of them (say 2-3) would get hospitalised that year and not everyone. This way everyone shares the risk of everyone else.

    Just like medical insurance, there is vehicle insurance where every year customer needs to pay a premium of a certain amount to its insurance provider company so that in case of an unfortunate accident by the vehicle, the insurance provider company will provide compensation (called ‘sum assured’) to the customer.

    Building a model to predict whether a customer would be interested in Vehicle Insurance is extremely helpful for the company because it can then accordingly plan its communication strategy to reach out to those customers and optimise its business model and revenue.

    Now, in order to predict, whether the customer would be interested in Vehicle insurance, you have information about demographics (gender, age, region code type), Vehicles (Vehicle Age, Damage), Policy (Premium, sourcing channel) etc.

    Content

    id: Unique ID for the customer Gender: Gender of the customer Age :: Age of the customer driving license: 0 :: Customer does not have DL, 1 : Customer already has DL RegionCode: Unique code for the region of the customer PreviouslyInsured 1 : Customer already has Vehicle Insurance, 0 : Customer doesn't have Vehicle Insurance VehicleAge: Age of the Vehicle VehicleDamage: 1 : Customer got his/her vehicle damaged in the past, 0 : Customer: Customer didn't get his/her vehicle damaged in the past. AnnualPremium: The amount customer needs to pay as premium in the year PolicySalesChannel: Anonymised Code for the channel of outreaching to the customer ie. Different Agents, Over Mail, Over Phone, In Person, etc. Vintage: Number of Days, Customer has been associated with the company Response: 1: Customer is interested, 0 : Customer is not interested

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Original DatabSource Analytics Vidhya: https://datahack.analyticsvidhya.com/contest/janatahack-cross-sell-prediction/#About

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  12. Cervical Cancer Risk Classification

    • kaggle.com
    zip
    Updated Aug 31, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gokagglers (2017). Cervical Cancer Risk Classification [Dataset]. https://www.kaggle.com/forums/f/5746/cervical-cancer-risk-classification/t/91763/data-is-from-caracas-venezuela?forumMessageId=528857
    Explore at:
    zip(9052 bytes)Available download formats
    Dataset updated
    Aug 31, 2017
    Authors
    Gokagglers
    Description

    Cervical Cancer Risk Factors for Biopsy: This Dataset is Obtained from UCI Repository and kindly acknowledged!

    This file contains a List of Risk Factors for Cervical Cancer leading to a Biopsy Examination!

    About 11,000 new cases of invasive cervical cancer are diagnosed each year in the U.S. However, the number of new cervical cancer cases has been declining steadily over the past decades. Although it is the most preventable type of cancer, each year cervical cancer kills about 4,000 women in the U.S. and about 300,000 women worldwide. In the United States, cervical cancer mortality rates plunged by 74% from 1955 - 1992 thanks to increased screening and early detection with the Pap test. AGE Fifty percent of cervical cancer diagnoses occur in women ages 35 - 54, and about 20% occur in women over 65 years of age. The median age of diagnosis is 48 years. About 15% of women develop cervical cancer between the ages of 20 - 30. Cervical cancer is extremely rare in women younger than age 20. However, many young women become infected with multiple types of human papilloma virus, which then can increase their risk of getting cervical cancer in the future. Young women with early abnormal changes who do not have regular examinations are at high risk for localized cancer by the time they are age 40, and for invasive cancer by age 50. SOCIOECONOMIC AND ETHNIC FACTORS Although the rate of cervical cancer has declined among both Caucasian and African-American women over the past decades, it remains much more prevalent in African-Americans -- whose death rates are twice as high as Caucasian women. Hispanic American women have more than twice the risk of invasive cervical cancer as Caucasian women, also due to a lower rate of screening. These differences, however, are almost certainly due to social and economic differences. Numerous studies report that high poverty levels are linked with low screening rates. In addition, lack of health insurance, limited transportation, and language difficulties hinder a poor woman’s access to screening services. HIGH SEXUAL ACTIVITY Human papilloma virus (HPV) is the main risk factor for cervical cancer. In adults, the most important risk factor for HPV is sexual activity with an infected person. Women most at risk for cervical cancer are those with a history of multiple sexual partners, sexual intercourse at age 17 years or younger, or both. A woman who has never been sexually active has a very low risk for developing cervical cancer. Sexual activity with multiple partners increases the likelihood of many other sexually transmitted infections (chlamydia, gonorrhea, syphilis).Studies have found an association between chlamydia and cervical cancer risk, including the possibility that chlamydia may prolong HPV infection. FAMILY HISTORY Women have a higher risk of cervical cancer if they have a first-degree relative (mother, sister) who has had cervical cancer. USE OF ORAL CONTRACEPTIVES Studies have reported a strong association between cervical cancer and long-term use of oral contraception (OC). Women who take birth control pills for more than 5 - 10 years appear to have a much higher risk HPV infection (up to four times higher) than those who do not use OCs. (Women taking OCs for fewer than 5 years do not have a significantly higher risk.) The reasons for this risk from OC use are not entirely clear. Women who use OCs may be less likely to use a diaphragm, condoms, or other methods that offer some protection against sexual transmitted diseases, including HPV. Some research also suggests that the hormones in OCs might help the virus enter the genetic material of cervical cells. HAVING MANY CHILDREN Studies indicate that having many children increases the risk for developing cervical cancer, particularly in women infected with HPV. SMOKING Smoking is associated with a higher risk for precancerous changes (dysplasia) in the cervix and for progression to invasive cervical cancer, especially for women infected with HPV. IMMUNOSUPPRESSION Women with weak immune systems, (such as those with HIV / AIDS), are more susceptible to acquiring HPV. Immunocompromised patients are also at higher risk for having cervical precancer develop rapidly into invasive cancer. DIETHYLSTILBESTROL (DES) From 1938 - 1971, diethylstilbestrol (DES), an estrogen-related drug, was widely prescribed to pregnant women to help prevent miscarriages. The daughters of these women face a higher risk for cervical cancer. DES is no longer prsecribed.

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
US Department of Health and Human Services (2017). Health Insurance Marketplace [Dataset]. https://www.kaggle.com/datasets/hhs/health-insurance-marketplace
Organization logo

Health Insurance Marketplace

Explore health and dental plans data in the US Health Insurance Marketplace

Explore at:
zip(868821924 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Authors
US Department of Health and Human Services
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.

median plan premiums

Exploration Ideas

To help get you started, here are some data exploration ideas:

  • How do plan rates and benefits vary across states?
  • How do plan benefits relate to plan rates?
  • How do plan rates vary by age?
  • How do plans vary across insurance network providers?

See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!

Data Description

This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.

Here, we've processed the data to facilitate analytics. This processed version has three components:

1. Original versions of the data

The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.

2. Combined CSV files that contain

In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:

  • BenefitsCostSharing.csv
  • BusinessRules.csv
  • Network.csv
  • PlanAttributes.csv
  • Rate.csv
  • ServiceArea.csv

Additionally, there are two CSV files that facilitate joining data across years:

  • Crosswalk2015.csv - joining 2014 and 2015 data
  • Crosswalk2016.csv - joining 2015 and 2016 data

3. SQLite database

The "database.sqlite" file contains tables corresponding to each of the processed CSV files.

The code to create the processed version of this data is available on GitHub.

Search
Clear search
Close search
Google apps
Main menu