The Car Allowance Rebate System (CARS), otherwise known as Cash for Clunkers, was a program intended to provide economic incentives to United States residents to purchase a new and more fuel efficient vehicle when trading in a less full efficient vehicle. The program was promoted as providing stimulus to the economy by boosting auto sales, while putting safer, cleaner and more fuel efficient vehicles on the road.
This dataset contains features about the existing vehicle models. Before predict the new kind of car model, we want to determine which existing vehicles on the market are most like the new models, how vehicles can be grouped, which group is the most similar with the new models.
Data tables containing aggregated information about vehicles in the UK are also available.
A number of changes were introduced to these data files in the 2022 release to help meet the needs of our users and to provide more detail.
Fuel type has been added to:
Historic UK data has been added to:
A new datafile has been added df_VEH0520.
We welcome any feedback on the structure of our data files, their usability, or any suggestions for improvements; please contact vehicles statistics.
CSV files can be used either as a spreadsheet (using Microsoft Excel or similar spreadsheet packages) or digitally using software packages and languages (for example, R or Python).
When using as a spreadsheet, there will be no formatting, but the file can still be explored like our publication tables. Due to their size, older software might not be able to open the entire file.
df_VEH0120_GB: https://assets.publishing.service.gov.uk/media/68494aca74fe8fe0cbb4676c/df_VEH0120_GB.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: Great Britain (CSV, 58.1 MB)
Scope: All registered vehicles in Great Britain; from 1994 Quarter 4 (end December)
Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]
df_VEH0120_UK: https://assets.publishing.service.gov.uk/media/68494acb782e42a839d3a3ac/df_VEH0120_UK.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: United Kingdom (CSV, 34.1 MB)
Scope: All registered vehicles in the United Kingdom; from 2014 Quarter 3 (end September)
Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]
df_VEH0160_GB: https://assets.publishing.service.gov.uk/media/68494ad774fe8fe0cbb4676d/df_VEH0160_GB.csv">Vehicles registered for the first time by body type, make, generic model and model: Great Britain (CSV, 24.8 MB)
Scope: All vehicles registered for the first time in Great Britain; from 2001 Quarter 1 (January to March)
Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]
df_VEH0160_UK: https://assets.publishing.service.gov.uk/media/68494ad7aae47e0d6c06e078/df_VEH0160_UK.csv">Vehicles registered for the first time by body type, make, generic model and model: United Kingdom (CSV, 8.26 MB)
Scope: All vehicles registered for the first time in the United Kingdom; from 2014 Quarter 3 (July to September)
Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]
In order to keep the datafile df_VEH0124 to a reasonable size, it has been split into 2 halves; 1 covering makes starting with A to M, and the other covering makes starting with N to Z.
df_VEH0124_AM: <a class="govuk-link" href="https://assets.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Your notebooks must contain the following steps:
CSV file - 19237 rows x 18 columns (Includes Price Columns as Target)
ID Price: price of the care(Target Column) Levy Manufacturer Model Prod. year Category Leather interior Fuel type Engine volume Mileage Cylinders Gear box type Drive wheels Doors Wheel Color Airbags
Confused or have any doubts in the data column values? Check the dataset discussion tab!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Car Prices Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sidharth178/car-prices-dataset on 29 August 2021.
--- Dataset description provided by original source is as follows ---
With the rise in the variety of cars with differentiated capabilities and features such as model, production year, category, brand, fuel type, engine volume, mileage, cylinders, colour, airbags and many more, we are bringing a car price prediction challenge for all. We all aspire to own a car within budget with the best features available. To solve the price problem we have created a dataset of 19237 for the training dataset and 8245 for the test dataset.
Train.csv - 19237 rows x 18 columns (Includes Price Columns as Target) - Attributes - ID - Price: price of the care(Target Column) - Levy - Manufacturer - Model - Prod. year - Category - Leather interior - Fuel type - Engine volume - Mileage - Cylinders - Gear box type - Drive wheels - Doors - Wheel - Color - Airbags Test.csv - 8245 rows x 17 columns
--- Original source retains full ownership of the source dataset ---
The NHTSA Product Information Catalog and Vehicle Listing (vPIC) is a consolidated platform that presents data collected within the manufacturer reported data from CFR 49 Parts 551 - 574 for use in a variety of modern tools. NHTSA's vPIC platform is intended to serve as a centralized source for basic Vehicle Identification Number (VIN) decoding, Manufacturer Information Database (MID), Manufacturer Equipment Plant Identification and associated data. vPIC is intended to support the Open Data and Transparency initiatives of the agency by allowing the data to be freely used by the public without the burden of manual retrieval from a library of electronic documents (PDFs). While these documents will still be available online for viewing within the Manufacturer Information Database (MID) module of vPIC one can view and use the actual data through the VIN Decoder and Application Programming Interface (API) modules.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘cars.csv’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/huseyinrakun/carscsv on 28 January 2022.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
The steps listed below must be included in your notebooks:
Data source - https://www.cardekho.com/used-car-details Cover image source - https://cdni.autocarindia.com/Utils/ImageResizer.ashx?n=https://cdni.autocarindia.com/Galleries/20200206032922_Tata-Harrier-BS6-5.jpg&w=872&h=578&q=75&c=1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘car_sales.csv’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/smritisingh1997/car-salescsv on 14 February 2022.
--- Dataset description provided by original source is as follows ---
This data contains data related to Car Sales
The data is required for the basic Linear Regression model. It can be used to explore all the basic Linear Regression assumptions, which are required if one wants to apply Linear Regression on the given data
We wouldn't be here without the help of others. I would especially like to thanks @Udemy, @Coursera, and @KhanAcademy
--- Original source retains full ownership of the source dataset ---
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains a set of damaged car images, each labeled with information about being fraudulent or non fraudulent with respect to damage claims in the csv file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘cars.csv’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/karimkeraani/carscsv on 28 January 2022.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of information about 50,815 policyholders who are car owners.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background
Battery electric vehicles (BEVs) are crucial for a sustainable transportation system. As more people adopt BEVs, it becomes increasingly important to accurately assess the demand for charging infrastructure. However, much of the current research on charging infrastructure relies on outdated assumptions, such as the assumption that all BEV owners have access to home chargers and the "Liquid-fuel" mental model. To address this issue, we simulate the travel and charging demand on three charging behavior archetypes. We use a large synthetic population of Sweden, including detailed individual characteristics, such as dwelling types (detached house vs. apartment) and activity plans (for an average weekday). This data repository aims to provide the BEV simulation's input, assumptions, and output so that other studies can use them to study sizing and location design of charging infrastructure, grid impact, etc.
A journal paper published in Transportation Research Part D: Transport and Environment details the method to create the data (particularly Section 2.2 BEV simulation).
https://doi.org/10.1016/j.trd.2023.103645
Methodology
This data product is centered on the 1.7 million inhabitants of the Västra Götaland (VG) region, which includes the second largest city in Sweden, Gothenburg. We specifically simulated 284,000 car agents who live in VG, representing 35% of all car users and 18% of the total population in the region. They spend their simulation day (representing an average weekday) in a variety of locations throughout Sweden.
This open data repository contains the core model inputs and outputs. The numbers in parentheses correspond to the data sets. We use individual agents' activity plans (1) and travel trajectories from MATSim simulation for the BEV simulation (2), in which we consider overnight charger access (3), car fleet composition referencing the current private car fleet in Sweden (4), and Swedish road network with slope information (5) with realistic BEV charging & discharging dynamics. For the BEV simulation, we tested ten scenarios of charging behavior archetypes and fast charging powers (6). The output includes the time history of travel trajectories and charging of the simulated BEVs across the different scenarios (7).
Data description
The current data product covers seven data files.
(1) Agents' experienced activity plans
File name: 1_activity_plans.csv
Column
Description
Data type
Unit
person
Agent ID
Integer
-
act_id
Activity index of each agent
Integer
-
deso
Zone code of Demographic statistical areas (DeSO)1
String
-
POINT_X
Coordinate X of activity location (SWEREF99TM)
Float
meter
POINT_Y
Coordinate Y of activity location (SWEREF99TM)
Float
meter
act_purpose
Activity purpose (work, home, other)
String
-
mode
Transport mode to reach the activity location (car)
String
-
dep_time
Departure time in decimal hour (0-23.99)
Float
hour
trav_time
Travel time to reach the activity location
String
hour:minute:second
trav_time_min
Travel time in decimal minute
Float
minute
speed
Travel speed to reach the activity location
Float
km/h
distance
Travel distance between the origin and the destination
Float
km
act_start
Start time of activity in minute (0-1439)
Integer
minute
act_time
Activity duration in decimal minute
Float
minute
act_end
End time of activity in decimal hour (0-23.99)
Float
hour
score
Utility score of the simulation day given by MATSim
Float
-
1 https://www.scb.se/vara-tjanster/oppna-data/oppna-geodata/deso--demografiska-statistikomraden/
(2) Travel trajectories
File name: 2_input_zip
Produced by MATSim simulation, the zip folder contains ten files (events_batch_X.csv.gz, X=1, 2, …, 10) of input events for the BEV simulation. They are the moving trajectories of the car agents in their simulation days.
Column
Description
Data type
Unit
time
Time in second in a simulation day (0-86399)
Integer
Second
type
Event type defined by MATSim simulation2
String
-
person
Agent ID
Integer
-
link
Nearest road link consistent with (5)
String
-
vehicle
Vehicle ID identical to person
Integer
-
2 One typical episode of MATSim simulation events: Activity ends (actend) -> Agent’s vehicle enters traffic (vehicle enters traffic) -> Agent’s vehicle moves from previous road segment to its next connected one (left link) -> Agent’s vehicle leaves traffic for activity (vehicle leaves traffic) -> Activity starts (actstart)
(3) Overnight charger access
File name: 3_home_charger_access.csv
Column
Description
Data type
Unit
person
Agent ID
Integer
-
home_charger
Whether an agent has access to a home garage charger/living in a detached house (0=no, 1=yes)
Integer
-
(4) Car fleet composition
File name: 4_car_fleet.csv
Column
Description
Data type
Unit
person
Agent ID
Integer
-
income_class
Income group (0=None, 1=below 180K, 2=180K-300K, 3=300K-420K, 4=above 420K)
Integer
-
car
Car model class (B=40 kWh, C=60 kWh, D=100 kWh)
String
-
(5) Road network with slope information
File name: 5_road_network_with_slope.shp (5 files in total)
Column
Description
Data type
Unit
length
The length of road link
Float
meter
freespeed
Free speed
Float
km/h
capacity
Number of vehicles
Integer
-
permlanes
Number of lanes
Integer
-
oneway
Whether the segment is one-way (0=no, 1=yes)
Integer
-
modes
Transport mode (car)
String
-
link_id
Link ID
String
-
from_node
Start node of the link
String
-
to_node
End node of the link
String
-
count
Aggregated traffic (number of cars travelled per day)
Integer
-
slope
Slope in percent from -6% to 6%
Float
-
geometry
LINESTRING (SWEREF99TM)
geometry
meter
(6) Simulation scenarios specifying the parameter sets
File name: 6_scenarios.txt
Parameter set
(paraset)
Strategy 1
Strategy 2
Strategy 3
Fast charging power (kW)
Minimum parking time for charging (min)
Intermediate charging power (kW)
0
0.2
0.2
0.9
150
5
22
1
0.2
0.2
0.9
50
5
22
2
0.3
0.3
0.9
150
5
22
3
0.3
0.3
0.9
50
5
22
(7) Time history of travel trajectories and charging of the simulated BEVs
File name: 7_output.zip
Produced by the BEV simulation, the zip folder contains four files (parasetX.csv.gz, X=1, 2, 3, 4) corresponding to the four parameter sets specified in (6). They are the moving trajectories of the car agents with simulated energy and charging time history in their simulation days.
Column
Description
Data type
Unit
person
Agent ID
Integer
-
home_charger
Whether an agent has access to a home garage charger/living in a detached house (0=no, 1=yes)
Integer
-
car
Car model class (B=40 kWh, C=60 kWh, D=100 kWh)
String
-
seq
Sequence ID of time history by agent
Integer
-
time
Time (0-86399)
Integer
Second
purpose
Valid for activities (home, work, school,
https://brightdata.com/licensehttps://brightdata.com/license
Our automotive datasets provide comprehensive insights into the global vehicle market, covering a wide range of data points related to car listings, pricing trends, vehicle specifications, and market demand. These datasets are ideal for businesses, analysts, and developers looking to enhance automotive research, optimize pricing strategies, or improve vehicle inventory management.
Key Features:
Vehicle Listings & Specifications: Access detailed information on cars, trucks, SUVs, motorcycles, and electric vehicles,
including make, model, year, trim, mileage, fuel type, and transmission.
Pricing & Market Trends: Analyze historical and real-time pricing data to track market fluctuations, assess vehicle depreciation,
and optimize pricing strategies.
Dealer & Private Seller Insights: Gain visibility into vehicle listings from dealerships and private sellers, including contact details,
location, and availability.
Vehicle Condition & Features: Identify key attributes such as accident history, service records, safety features, and additional specifications.
Regional & Global Coverage: Access datasets segmented by country, state, or city to analyze local and international automotive markets.
Use Cases:
Market Research & Competitive Analysis: Monitor automotive trends, track competitor pricing, and assess consumer demand.
Pricing Optimization: Adjust vehicle pricing based on real-time market data to maximize profitability and sales efficiency.
Inventory & Fleet Management: Improve vehicle sourcing, inventory tracking, and fleet management for dealerships and rental companies.
Automotive AI & Machine Learning: Train predictive models for vehicle valuation, demand forecasting, and fraud detection.
Consumer Insights & Lead Generation: Identify potential buyers, analyze purchasing behavior, and enhance targeted marketing efforts.
Our automotive datasets are available in multiple formats (JSON, CSV, Excel) and can be delivered via
API, cloud storage (AWS, Google Cloud, Azure), or direct download.
Gain valuable insights into the automotive industry with high-quality, structured data tailored to your needs.
This dataset shows the Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs) that are currently registered through Washington State Department of Licensing (DOL).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was hosted on IBM Cloud object
You can find the "Automobile Dataset" from the following link: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data.
I cleaned the data myself, you can check notebook "Used Car Pricing Data Wrangling" for details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Source/Credit: Michael Grogan https://github.com/MGCodesandStats https://github.com/MGCodesandStats/datasets/blob/master/cars.csv
Sample dataset for regression analysis. Given 5 attributes (age, gender, miles driven per day, debt, and income) predict how much someone will spend on purchasing a car. All 5 of the input attributes have been scaled to be in 0 to 1 range. Training set has 723 training examples. Test set has 242 test examples.
This dataset will be used in an upcoming Galaxy Training Network tutorial (https://training.galaxyproject.org/training-material/topics/statistics/) on use of feedforward neural networks for regression analysis.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is a curated collection of 3D car models derived from Objaverse-XL described in MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling. The MeshFleet dataset provides metadata for 3D car models, including their SHA256 from Objaverse-XL, vehicle category, and size. The core dataset is available as a CSV file: meshfleet_with_vehicle_categories_df.csv. You can easily load it using pandas: import pandas as pd
meshfleet_df =… See the full description on the dataset page: https://huggingface.co/datasets/DamianBoborzi/MeshFleet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides supporting data for the figures presented in our study on electric vehicle (EV) usage and charging behavior across major Chinese cities. The detailed analysis and raw data are thoroughly described in Zhan et al (2025). The study examines 1.69 million EVs, representing 42% of China's total EV fleet, from November 2020 to October 2021. The study provides insights into operational demands, infrastructure requirements, and energy consumption patterns by analyzing diverse vehicle types—including private cars, taxis, buses, and special purpose vehicles (SPVs).
The purpose of this dataset is to enable researchers who do not have access to the same raw data to replicate, calibrate, or extend our findings using the processed data that underpins each figure. This resource is valuable for further research on EV infrastructure planning, energy consumption, and vehicle performance. This dataset is made available to help the research community leverage our findings and facilitate advancements in electric vehicle research and infrastructure planning. Please refer to Zhan et al (2025) for full details on the methodology and analysis.
This dataset includes the processed data underlying each figure in Zhan et al (2025), covering various aspects of EV usage, battery capacity, and charging behavior across seven major Chinese cities: Beijing, Shanghai, Guangzhou, Shenzhen, Nanjing, Chengdu, and Chongqing. The dataset is organized to correspond directly with the figures in the paper, facilitating its use for further analysis and model calibration. Each dataset is aligned with specific figures, providing essential data to help researchers without access to the original raw data.
Fig1a.Distribution of EV types across selected Chinese cities
File: Fig1a.Distribution of EV types across selected Chinese cities.csv
Description: Distribution of EV types across seven cities, detailing the share of different vehicle types.
Column |
Description |
Data type |
Unit |
Beijing |
Distribution of EV types in Beijing |
Float |
% |
Shenzhen |
Distribution of EV types in Shenzhen |
Float |
% |
Shanghai |
Distribution of EV types in Shanghai |
Float |
% |
Guangzhou |
Distribution of EV types in Guangzhou |
Float |
% |
Chengdu |
Distribution of EV types in Chengdu |
Float |
% |
Chongqing |
Distribution of EV types in Chongqing |
Float |
% |
Nanjing |
Distribution of EV types in Nanjing |
Float |
% |
Fig1b.Distribution of battery energy by vehicle types
File: Fig1b.Distribution of battery energy by vehicle types.csv
Description: Distribution of battery energy across different vehicle types, represented as box plot statistics.
Column |
Description |
Data type |
Unit |
type_2 |
vehicle types |
String |
- |
Lower Whisker |
The battery energy corresponding to the Lower Whisker of the box plot. |
Float |
kWh |
Q1 (25%) |
The 25th percentile value of battery energy. |
Float |
kWh |
Median (50%) |
The median value of battery energy. |
Float |
kWh |
Q3 (75%) |
The 75th percentile value of battery energy. |
Float |
kWh |
Upper Whisker |
The battery energy corresponding to the Upper Whisker of the box plot. |
Float |
kWh |
Fig1c.Variations of battery energy of buses
File: Fig1c.Variations of battery energy of buses across studied cities.csv
Description: Battery energy variations for buses across the studied cities.
Column |
Description |
Data type |
Unit |
city_En |
English name of 7 Chinese city |
String |
- |
Lower Whisker |
The battery energy of buses corresponding to the Lower Whisker of the box plot. |
Float |
kWh |
Q1 (25%) |
The 25th percentile value of battery energy of buses. |
Float |
kWh |
Median (50%) |
The median value of battery energy of buses. |
Float |
kWh |
Q3 (75%) |
The 75th percentile value of battery energy of buses. |
Float |
kWh |
Upper Whisker |
The battery energy of buses corresponding to the Upper Whisker of the box plot. |
Float |
kWh |
Fig1d.Variations of battery energy of SPVs
File: Fig1c.Variations of battery energy of SPVs across studied cities.csv
Description: Battery energy variations for special purpose vehicles (SPVs) across cities.
Column |
Description |
Data type |
Unit |
city_En |
English name of 7 Chinese city |
String |
- |
Lower Whisker |
The battery energy of SPVs corresponding to the Lower Whisker of the box plot. |
Float |
kWh |
Q1 (25%) |
The 25th |
http://data.gov.hk/en/terms-and-conditionshttp://data.gov.hk/en/terms-and-conditions
Table 4.1 (f) - First Registration of Light Goods Vehicles by Make, First Registration Vehicle Status, Fuel Type and Body Type(Simplified Chinese)
The Car Allowance Rebate System (CARS), otherwise known as Cash for Clunkers, was a program intended to provide economic incentives to United States residents to purchase a new and more fuel efficient vehicle when trading in a less full efficient vehicle. The program was promoted as providing stimulus to the economy by boosting auto sales, while putting safer, cleaner and more fuel efficient vehicles on the road.