https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains insurance rates data from across the United States, providing insights into the premiums charged by insurers, the underlying factors that affect those rates, and claims history analysis. The data is designed to help researchers understand the inner workings of the insurance industry, and how rates are calculated. It includes information on premiums, underlying factors, current premium prices, indicated premium prices, selected premium prices, fixed expenses, and more
This dataset can be used to understand the inner workings of the insurance industry, and how rates are calculated. The data includes information on premiums, underlying factors, claims history analysis, and more. This dataset can be used to research insurance rates across the United States and to understand how these rates are determined
- Understand the inner workings of the insurance industry, and how rates are calculated
- Help insurance companies better understand their own pricing models
- Understand how their premiums are calculated
I would like to acknowledge The Markup for providing the data for this dataset
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: cgr-definitions-table.csv | Column name | Description | |:--------------|:----------------------------------| | cgr | Combined grade rating. (Numeric) | | aa | Average annual premium. (Numeric) | | bb | Base premium. (Numeric) | | cc | Cost of capital. (Numeric) | | va | Value of assets. (Numeric) | | dd | Direct written premium. (Numeric) | | hh | Homeownership. (Categorical) |
File: cgr-premiums-table.csv | Column name | Description | |:-----------------------------|:--------------------------------------------------| | territory | The territory in which the person lives. (String) | | gender | The person's gender. (String) | | birthdate | The person's birthdate. (Date) | | ypc | The person's years of prior coverage. (Integer) | | current_premium | The person's current premium. (Float) | | indicated_premium | The person's indicated premium. (Float) | | selected_premium | The person's selected premium. (Float) | | underlying_premium | The person's underlying premium. (Float) | | fixed_expenses | The person's fixed expenses. (Float) | | underlying_total_premium | The person's underlying total premium. (Float) | | cgr_factor | The person's CGR factor. (Float) |
File: territory-definitions-table.csv | Column name | Description | |:----------------|:-------------------------------------------------------------------| | territory | The territory in which the person lives. (String) | | county | The county in which the person lives. (String) | | county_code | The county code for the county in which the person lives. (String) | | zipcode | The zip code for the county in which the person lives. (String) | | town | The town in which the person lives. (String) |
]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data is formatted as a spreadsheet, encompassing the primary activities over a span of three full years (November 2015 to December 2018) concerning non-life motor insurance portfolio. This dataset comprises 105,555 rows and 30 columns. Each row signifies a policy transaction, while each column represents a distinct variable.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Data provided by insurers, on the premiums written and claims incurred for the 2013 fiscal year. Based on reporting on the consolidated pages of the P&C-1 or Life-1 Annual returns. This data is also reported in the Superintendent of Insurance’s Annual Report.
The frequency of private passenger comprehensive auto insurance claims for physical damage in the United States rose to **** per 100 car years in 2023, compared to *** in 2020. This was the highest frequency recorded over the past 15 years.
Your client is a car insurance company. They want to price their car insurance competitively, which means having a good model for customers at risk of getting into accidents.
Each row corresponds to a customer, the outcome column records whether the customer made a claim in the previous year or not. The client has informed you that the other columns should be self-explanatory.
The client is interested to know if the customer data can be used to predict the likelihood that a claim is made in the next year. Your task is to investigate this and make a recommendation. You should complete the following tasks:
Louisiana had the most expensive annual car insurance premiums at ***** U.S. dollars for full coverage. Alaska ranked in first place, having the highest annual cost for minimum car insurance coverage at *** U.S. dollars.Why it varies state by state The huge variance in premiums between states is due to the difference in state laws, the percentage of uninsured drivers in the state, the frequency of natural disasters, and claim rates. For instance, Michigan has a no-fault car insurance system, which means that claims are more common. This drives up the cost of insurance for all drivers because insurers need to pay out more money in claims. Male drivers also pay more There is also a difference between premiums among different age groups. In 2025, 25-year-old male drivers paid more per month than 25-year-old female drivers did. This is due to the higher incidence of accidents among young male drivers. This means that young drivers in states that already have higher premiums must pay a lot for car insurance.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description This event log has been artificially generated and curated to provide a comprehensive view of car insurance claims, allowing users to discover and identify bottlenecks, automation opportunities, conformance issues, reworks, and potential fraudulent cases using any process mining software.
You can find more event logs here: https://processminingdata.com/JfVPOR
Standard Process flow: “First Notification of Loss (FNOL)” -> “Assign Claim” -> “Claim Decision” -> “Set Reserve” -> “Payment Sent” -> “Close Claim”
Attributes: - case ID - activity name - timestamp - claimant name - agent name - adjuster name - claim amount - claimant age - type of policy - car make - car model - car year - date and time of the accident - type of accident - user type
Total number of claims: 30,000
Dates: Claims belong to years 2020, 2021, and 2022.
Disclaimer: Personal names are fake.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Insurance Assessment: This model can be used by insurance companies to automate the process of assessing car damage in insurance claims. By simply using photographs of the damaged vehicle, the model can identify the type and extent of damage, making the claim processing faster and more objective.
Automotive Repair Estimates: Car repair shops can use this model to get an approximate idea of the damage and therefore provide a more accurate cost estimate for their clients. It can also assist in identifying nonobvious damage.
Used Car Market Evaluation: This model can be used in used car platforms to evaluate the current condition of the cars listed for sale. By identifying existing damage, buyers can make more informed decisions and sellers can price their vehicles more accurately.
Law Enforcement and Road Safety: Traffic police and accident investigation teams can utilize this model to evaluate the types of damages after a road accident. It will assist in rebuilding the accident scenario, providing insights during investigations.
Auto-manufacturing Quality Control: Automobile manufacturers can use this model in their factories to automatically inspect new cars for any damage or misaligned/missing parts before they are dispatched from the factory, ensuring quality control.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Vehicle insurance is insurance for cars, trucks, motorcycles, and other road vehicles. Its main purpose is to provide financial protection against the following: - Physical damage or bodily injury caused by traffic collisions - Liability that could arise from incidents in a vehicle
Vehicle insurance may additionally offer financial protection against theft of the vehicle and against damage to the vehicle sustained because of events other than traffic collisions such as keying, weather, or natural disasters, and damage sustained by colliding with stationary objects.
You have been hired as a Machine Learning expert by a leading car insurance company. Your task is to predict the insurance claim of the cars that are provided in the dataset.
The dataset folder contains the following: - The trainImages folder: Contains 1399 training images - The testImages folder: Contains 600 testing images - train.csv: Contains 1399 x 8 data points - test.csv: Contains 600 x 6 data points - sample_submission.csv: Contains 5 x 3 data points
Column Name | Description |
---|---|
Image Path | Represents the name of the image |
Insurance_company | Represents masked values of some insurance companies |
Cost of Vehicles | Represents the cost of a vehicle present in the image |
Min Converage | Represents the minimum coverage provided by an insurance company |
Expiry Date | Represents the expiry date of the insurance |
Max Coverage | Represents the maximum coverage provided by an insurance company |
Condition | Represents whether a vehicle is damaged |
Amount | Represents the insurance amount of a vehicle |
The dataset consists of parameters such as the images of damaged cars, the cost price of the cars and their insurance claim, and the like. The benefits of practicing this problem by using Machine Learning techniques are as follows:
The DFS ranks automobile insurance companies doing business in New York State based on the number of consumer complaints upheld against them as a percentage of their total business over a two-year period. Complaints typically involve issues like delays in the payment of no-fault claims and nonrenewal of policies. Insurers with the fewest upheld complaints per million dollars of premiums appear at the top of the list. Those with the highest complaint ratios are ranked at the bottom.
This file contains ultimate claims data taken from the private motor National Claims Iinformation Database (NCID). The claims are grouped together by accident year, the year in which the accident occurred. Not all claims are paid in the lifetime of the policy. Some claims, injury claims in particular, can take many years to be settled and be fully paid. Insurers estimate the cost/number of claims expected for a particular accident year, and this known as the ultimate cost/number of claims. The ultimate cost/number of claims is recalculated regularly, based on the most up-to-date information available. The more time that has passed since the accident year, the more certain the ultimate cost of claims becomes. To view the detailed NCID report kindly refer to the centralbank publication link in the Landing Page section under Additional Info.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains premium data taken from the private motor National Claims Information Database (NCID). Premiums and policy numbers are presented on a “written” and “earned” policy basis and further broken down by different levels of cover - comprehensive and third party. To view the detailed NCID report refer to the Central Bank publication link under Additional Info.
https://data.gov.sg/open-data-licencehttps://data.gov.sg/open-data-licence
Dataset from Singapore Department of Statistics. For more information, visit https://data.gov.sg/datasets/d_abcfd12381e7f8d175280d999cdb2dea/view
https://choosealicense.com/licenses/gpl-2.0/https://choosealicense.com/licenses/gpl-2.0/
freMTPL2 Dataset
This dataset is a mirror of the freMTPL2 frequency and severity datasets, originally published by Arthur Charpentier to accompany his textbook Computational Actuarial Science with R. The freMTPL2 dataset contains data on Third-Party Liability (TPL) Motor insurance policies issued in France, along with claims filed against those policies, observed over a duration of just over a year. These observations are organized into two separate CSV files:
freMTPL2freq.csv: a… See the full description on the dataset page: https://huggingface.co/datasets/mabilton/fremtpl2.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This table contains values from Compare.com's proprietary database of car insurance quotes about average DynamicTable.dataset.coverage.monthly_premium_dog car insurance costs DynamicTable.dataset.source.petBreedStateAvgPrices
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Vehicle insurance fraud involves conspiring to make false or exaggerated claims involving property damage or personal injuries following an accident. Some common examples include staged accidents where fraudsters deliberately “arrange” for accidents to occur; the use of phantom passengers where people who were not even at the scene of the accident claim to have suffered grievous injury, and make false personal injury claims where personal injuries are grossly exaggerated.
This dataset contains vehicle dataset - attribute, model, accident details, etc along with policy details - policy type, tenure etc. The target is to detect if a claim application is fraudulent or not - FraudFound_P
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Switzerland Non Life Insurance: Claims Paid: Liability and Motor data was reported at 4,676.000 CHF mn in 2016. This records a decrease from the previous number of 4,802.000 CHF mn for 2015. Switzerland Non Life Insurance: Claims Paid: Liability and Motor data is updated yearly, averaging 4,628.000 CHF mn from Dec 2000 (Median) to 2016, with 17 observations. The data reached an all-time high of 4,918.000 CHF mn in 2009 and a record low of 3,844.000 CHF mn in 2000. Switzerland Non Life Insurance: Claims Paid: Liability and Motor data remains active status in CEIC and is reported by Swiss Financial Market Supervisory Authority. The data is categorized under Global Database’s Switzerland – Table CH.RG011: Non Life Insurance: Claims Paid.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Insurance Claim Processing: The model can be used to expedite processing of insurance claims by quickly categorizing whether a vehicle has been damaged or not from uploaded incident photos. This can help insurance agents prioritize claims and perform investigations more efficiently.
Vehicle Rental Services: This model can be useful for rental agencies to automatically validate the state of their vehicles when they are returned by customers. It will allow them to spot any new damages without the need of manual inspection.
Online Marketplace Quality Control: Online platforms for buying/selling used cars can provide an additional layer of quality control before listings go live. Sellers can submit photos of their vehicles which are then analyzed using this model to verify the condition of the car.
Traffic Management and Law Enforcement: The model can be used by traffic authorities or law enforcement agencies to automatically identify and classify damaged vehicles from CCTV or drone footage during accidents, which can assist in accident location, investigation and traffic management.
Automated Driving Systems: In autonomous vehicles, this solution can be an integral part of the system to detect and avoid damaged cars on the road, contributing to safer driving conditions.
This data set measures and describes participation in PIRP. The researcher may ascertain how many motorists have completed the course and tabulate subsets by: year and month of course completion; motorist residency, age and sex; course provider and delivery method.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accurate forecasting of claim frequency in automobile insurance is essential for insurers to assess risks effectively and establish appropriate pricing policies. Traditional methods typically rely on a Poisson distribution for modeling claim counts; however, this approach can be inadequate due to frequent zero-claim periods, leading to zero inflation in the data. Zero inflation occurs when more zeros are observed than expected under standard Poisson or negative binomial (NB) models. While machine learning (ML) techniques have been explored for predictive analytics in other contexts, their application to zero-inflated insurance data remains limited. This study investigates the utility of ML in improving forecast accuracy under conditions of zero-inflation, a data characteristic common in automobile insurance. The research involved a comparative evaluation of several models, including Poisson, NB, zero-inflated Poisson (ZIP), hurdle Poisson, zero-inflated negative binomial (ZINB), hurdle negative binomial, random forest (RF), support vector machine (SVM), and artificial neural network (ANN) on an insurance dataset. The performance of these models was assessed using mean absolute error. The results reveal that the SVM model outperforms others in predictive accuracy, particularly in handling zero-inflation, followed by the ZIP and ZINB models. In contrast, the traditional Poisson and NB models showed lower predictive capabilities. By addressing the challenge of zero-inflation in automobile claim data, this study offers insights into improving the accuracy of claim frequency predictions. Although this study is based on a single dataset, the findings provide valuable perspectives on enhancing prediction accuracy and improving risk management practices in the insurance industry.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains insurance rates data from across the United States, providing insights into the premiums charged by insurers, the underlying factors that affect those rates, and claims history analysis. The data is designed to help researchers understand the inner workings of the insurance industry, and how rates are calculated. It includes information on premiums, underlying factors, current premium prices, indicated premium prices, selected premium prices, fixed expenses, and more
This dataset can be used to understand the inner workings of the insurance industry, and how rates are calculated. The data includes information on premiums, underlying factors, claims history analysis, and more. This dataset can be used to research insurance rates across the United States and to understand how these rates are determined
- Understand the inner workings of the insurance industry, and how rates are calculated
- Help insurance companies better understand their own pricing models
- Understand how their premiums are calculated
I would like to acknowledge The Markup for providing the data for this dataset
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: cgr-definitions-table.csv | Column name | Description | |:--------------|:----------------------------------| | cgr | Combined grade rating. (Numeric) | | aa | Average annual premium. (Numeric) | | bb | Base premium. (Numeric) | | cc | Cost of capital. (Numeric) | | va | Value of assets. (Numeric) | | dd | Direct written premium. (Numeric) | | hh | Homeownership. (Categorical) |
File: cgr-premiums-table.csv | Column name | Description | |:-----------------------------|:--------------------------------------------------| | territory | The territory in which the person lives. (String) | | gender | The person's gender. (String) | | birthdate | The person's birthdate. (Date) | | ypc | The person's years of prior coverage. (Integer) | | current_premium | The person's current premium. (Float) | | indicated_premium | The person's indicated premium. (Float) | | selected_premium | The person's selected premium. (Float) | | underlying_premium | The person's underlying premium. (Float) | | fixed_expenses | The person's fixed expenses. (Float) | | underlying_total_premium | The person's underlying total premium. (Float) | | cgr_factor | The person's CGR factor. (Float) |
File: territory-definitions-table.csv | Column name | Description | |:----------------|:-------------------------------------------------------------------| | territory | The territory in which the person lives. (String) | | county | The county in which the person lives. (String) | | county_code | The county code for the county in which the person lives. (String) | | zipcode | The zip code for the county in which the person lives. (String) | | town | The town in which the person lives. (String) |
]