Facebook
TwitterOVERVIEW In March 2019, Poverty Solutions released “AUTO INSURANCE AND ECONOMIC MOBILITY IN MICHIGAN: A CYCLE OF POVERTY”, a policy brief detailing the sources of Michigan’s highest-in-the-nation auto insurance rates and providing policy options for policymaker seeking to enact changes that would reduce overall rates and reduce rate disparities. The report pulled data from The Zebra, an auto insurance comparison marketplace, to show the distribution of rates by ZIP code and to calculate a cost burden for each ZIP code. DATAThe Zebra – provides ZIP code level data on average auto insurance rates from 2011-2017. The data represents an average of market prices facing a consistent base consumer profile. According to the Zebra, “Analysis used a consistent base profile for the insured driver: a 30-year-old single male driving a 2014 Honda Accord EX with a good driving history and coverage limits of $50,000 bodily injury liability per person/$100,000 bodily injury liability per accident/$50,000 property damage liability per accident with a $500 deductible for comprehensive and collision”.[1] For more information on The Zebra’s data collection methodology go to www.thezebra.com.Click here for metadata (descriptions of the fields).
Facebook
TwitterThis statistic shows the opinion on influence of zip code on car insurance rates in the United States in 2016. The results of the survey revealed that ** percent of the respondents from Gen X believed that zip code had an effect on car insurance rates.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains insurance rates data from across the United States, providing insights into the premiums charged by insurers, the underlying factors that affect those rates, and claims history analysis. The data is designed to help researchers understand the inner workings of the insurance industry, and how rates are calculated. It includes information on premiums, underlying factors, current premium prices, indicated premium prices, selected premium prices, fixed expenses, and more
This dataset can be used to understand the inner workings of the insurance industry, and how rates are calculated. The data includes information on premiums, underlying factors, claims history analysis, and more. This dataset can be used to research insurance rates across the United States and to understand how these rates are determined
- Understand the inner workings of the insurance industry, and how rates are calculated
- Help insurance companies better understand their own pricing models
- Understand how their premiums are calculated
I would like to acknowledge The Markup for providing the data for this dataset
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: cgr-definitions-table.csv | Column name | Description | |:--------------|:----------------------------------| | cgr | Combined grade rating. (Numeric) | | aa | Average annual premium. (Numeric) | | bb | Base premium. (Numeric) | | cc | Cost of capital. (Numeric) | | va | Value of assets. (Numeric) | | dd | Direct written premium. (Numeric) | | hh | Homeownership. (Categorical) |
File: cgr-premiums-table.csv | Column name | Description | |:-----------------------------|:--------------------------------------------------| | territory | The territory in which the person lives. (String) | | gender | The person's gender. (String) | | birthdate | The person's birthdate. (Date) | | ypc | The person's years of prior coverage. (Integer) | | current_premium | The person's current premium. (Float) | | indicated_premium | The person's indicated premium. (Float) | | selected_premium | The person's selected premium. (Float) | | underlying_premium | The person's underlying premium. (Float) | | fixed_expenses | The person's fixed expenses. (Float) | | underlying_total_premium | The person's underlying total premium. (Float) | | cgr_factor | The person's CGR factor. (Float) |
File: territory-definitions-table.csv | Column name | Description | |:----------------|:-------------------------------------------------------------------| | territory | The territory in which the person lives. (String) | | county | The county in which the person lives. (String) | | county_code | The county code for the county in which the person lives. (String) | | zipcode | The zip code for the county in which the person lives. (String) | | town | The town in which the person lives. (String) |
]
Facebook
TwitterAuto insurance rate data for Progressive across 1939 ZIP codes in Texas. Median 6-month premium: $1010.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This table uses Insurify's proprietary quote data to show GEICO, Allstate, and State Farm's cheapest monthly liability-only costs.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This table uses Insurify's proprietary quote data to show State Farm, USAA, and Liberty Mutual's cheapest monthly liability-only costs for major cities in Maryland.
Facebook
TwitterAdvanced Business Analytics - Power of Predictive Modeling Course ID: 226161-D, course type: lab Project 2, deadline: 27th January 2023 You are employed as a data scientist at a nonlife insurance company with a focus on motor insurance business. Your manager has requested you to analyze the data on claims reported by the clients in the first quarter of 2023 in order to perform customer segmentation and determine the key risk drivers of claims severity. The main purpose of the analysis is to allow the actuarial department to better assess the risk for the given line of business as well as to identify potential frauds requiring further investigation by claims handling department of the company.
Descirptions on table Feature name; Description cust_age; Age of the customer policy_id; Insurance policy ID coverage_start_date; Insurance coverage start date cust_region; Customer region sum_assured_group; Sum assured group (low, medium or high sum assured) ins_deductible; Insurance deductible (amount which is deducted from the sum paid by the insurer in case claim is accepted) annual_prem; Annual premium zip_code; Postal code insured_sex; Gender edu_lvl; Education level marital_status; Marital status claim_incurred_date; Date when claim has been incurred claim_type; Incident type acc_type; Accident type (if applicable) emg_services_notified; Emergency services notified about the incident incident_city; City where the incident occurred incident_hour; Hour when the incident occurred num_vehicles_involved; Number of vehicles involved property_damage; Variable indicating whether a property damage (other than car damage) has occurred bodily_injuries; Number of people affected by bodily injuries due to the incident witnesses; Number of witnesses of the incident police_report_avlbl; Variable indicating whether police report on the incident is available total_claim_amount; Total claim amount injury_claim; Amount of claim related to bodily injury property_claim; Property claim amount vehicle_claim; Vehicle claim amount car_brand; Car brand car_model; Car model production_year; Car production year
Task description: Based on abovementioned dataset prepare the following tasks: 1. Exploratory data analysis and feature engineering Conduct exploratory data analysis (i.a. missing values, descriptive statistics of characteristics and their distributions, etc.). Create new features that can be used to obtain additional information about the analyzed customer portfolio. Analyze the relationships between the features and generate appropriate visualizations. Based on the analyzes performed, select the variables that you will use to build the segmentation model and briefly justify your choice. 2. Building the segmentation model Using the selected variables (make relevant transformations if necessary), build a segmentation model using the K-means method. Briefly justify the choice of the optimal number of clusters, as well as the choice of optimal cluster initialization points. 3. Business analysis Describe the groups of insured persons selected on the basis of the model and interpret the statistics for individual segments and their business characteristics. Visualize the segments. 4. Anomaly detection* Using any anomaly detection algorithm, try to identify unusual claims/damage reports that may be an attempt to extort compensation from the insurance company. Indicate a maximum of 5 observations identified in this way and briefly justify your choice. Score to obtain: 10 points maximum, out of which: Correctness of results and interpretation: 3 (weight: x2) Programming (possibility of reproducing results, code readability, comments): 2 Aesthetics of work and completeness of materials: 1 Innovation of the proposed solution: 1 Possibility of obtaining additional 2 points for an extra task (*), i.e. max 12/10. Task submission: Please upload your solution to the MS Teams group. The solution should include: a. Text file with the solution to the task, analysis results, model parameter estimates and visualizations along with description and conclusions (.pdf). All authors should be listed at the very beginning of the report. b. Program code (SAS, R or Python) containing the definition of the SAS library (if applicable)/working directory and the libraries used (if applicable) at the beginning of the program code to enable reproduction of results after changing the working directory/input data path. c. A printout containing the default set of results obtained using the prepared program code.
Facebook
TwitterDataset Description: This dataset contains detailed information on traffic collisions in New York City (NYC) during the years 2021 and 2022. It includes data on the crash date and time, geographic location, contributing factors, and the types of vehicles involved. The dataset is designed to provide insights into traffic accidents and their patterns, enabling various analyses, such as the impact of weather, time of day, or road conditions on collisions.
Key Features:
CRASH DATE: The date when the accident occurred (MM/DD/YYYY or DD-MM-YYYY format).
CRASH TIME: The time when the crash took place (HH:MM:SS).
COLLISION_ID: A unique identifier for each collision.
BOROUGH: The borough in NYC where the crash occurred (e.g., Brooklyn, Manhattan, Queens, Bronx).
ZIP CODE: The zip code associated with the crash location.
LATITUDE: Latitude coordinate of the crash location.
LONGITUDE: Longitude coordinate of the crash location.
LOCATION: Combined latitude and longitude as a coordinate pair.
OFF STREET NAME: The street name where the crash occurred.
NUMBER OF PERSONS INJURED: Total number of individuals injured in the crash.
NUMBER OF PERSONS KILLED: Total number of fatalities in the crash.
NUMBER OF PEDESTRIANS INJURED: Number of pedestrians injured in the crash.
NUMBER OF PEDESTRIANS KILLED: Number of pedestrian fatalities in the crash.
NUMBER OF CYCLISTS INJURED: Number of cyclists injured in the crash.
NUMBER OF CYCLISTS KILLED: Number of cyclist fatalities in the crash.
NUMBER OF MOTORISTS INJURED: Number of motor vehicle operators or passengers injured.
CONTRIBUTING FACTOR: The main cause or contributing factor for the crash (e.g., Driver Inexperience, Passing Too Closely, Driver Inattention).
VEHICLE TYPE: Type of vehicle involved in the crash (e.g., Sedan, Tanker, Station Wagon/Sport Utility Vehicle, Taxi).
Dataset Overview: This dataset focuses on traffic collisions in New York City, with key variables capturing both the location of the crash (including latitude, longitude, and street name) and the severity of the incident (injuries and fatalities). In addition, the dataset also contains information on the contributing factors and vehicle types involved, making it useful for identifying patterns and causes of traffic accidents.
The data can be used to explore various trends in vehicle collisions such as:
Time and date trends of traffic accidents.
Geospatial analysis of accident hotspots in New York City.
Analysis of accident severity based on contributing factors (e.g., driver distraction, speeding).
Comparison of injury rates across different types of vehicles.
Use Cases: Traffic Safety Analysis: Study accident trends to identify high-risk areas, times, or factors contributing to crashes. This can inform policies to improve road safety in NYC.
Urban Planning & Infrastructure Design: Use the data to understand where traffic collisions are most likely to occur, helping to guide investments in infrastructure improvements (e.g., better signage, road design changes).
Predictive Modeling: Leverage machine learning to predict the likelihood of crashes based on certain variables such as time of day, contributing factors, or vehicle type.
Public Awareness Campaigns: Use the data to help design targeted public safety campaigns focused on reducing accidents in particular boroughs or at specific times.
Insurance & Risk Assessment: Insurance companies can use this dataset to better understand risk patterns in different areas and offer more accurate pricing.
Data Quality: The dataset is derived from real-world traffic collision reports in New York City.
The data spans from late 2021 to early 2022, offering a recent snapshot of NYC’s traffic accident patterns.
Data may have missing entries, such as latitude/longitude for some records. These rows should be handled appropriately when used for analysis.
Dataset Licensing: The dataset is provided under an open license, allowing users to freely access, share, and use the data for analysis, research, and educational purposes.
Example Use Cases: Geospatial Distribution of Traffic Accidents: By mapping the latitude and longitude coordinates, you can visualize accident hotspots across NYC.
Impact of Contributing Factors: Compare how often specific factors like “Driver Inexperience” or “Oversized Vehicle” contribute to accidents, helping to identify the most common causes of traffic incidents.
Crash Severity Analysis: Analyze the relationship between vehicle type (e.g., Sedan, SUV, Taxi) and the number of injuries or fatalities, potentially influencing vehicle safety features or public transportation policies.
Conclusion: This dataset is a rich resource for those interested in studying traffic accidents, vehicle types, and contributing factors in New York City. Whether for academic research, urban planning, policy-making, or insurance purposes, this dataset provides valuable insights into NYC’s traffic safety landscape.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterOVERVIEW In March 2019, Poverty Solutions released “AUTO INSURANCE AND ECONOMIC MOBILITY IN MICHIGAN: A CYCLE OF POVERTY”, a policy brief detailing the sources of Michigan’s highest-in-the-nation auto insurance rates and providing policy options for policymaker seeking to enact changes that would reduce overall rates and reduce rate disparities. The report pulled data from The Zebra, an auto insurance comparison marketplace, to show the distribution of rates by ZIP code and to calculate a cost burden for each ZIP code. DATAThe Zebra – provides ZIP code level data on average auto insurance rates from 2011-2017. The data represents an average of market prices facing a consistent base consumer profile. According to the Zebra, “Analysis used a consistent base profile for the insured driver: a 30-year-old single male driving a 2014 Honda Accord EX with a good driving history and coverage limits of $50,000 bodily injury liability per person/$100,000 bodily injury liability per accident/$50,000 property damage liability per accident with a $500 deductible for comprehensive and collision”.[1] For more information on The Zebra’s data collection methodology go to www.thezebra.com.Click here for metadata (descriptions of the fields).