http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
This dataset consists of various types of cars. The dataset is organized into 2 folders (train, test) and contains subfolders for each car category. There are 4,165 images (JPG) and 7 classes of cars.
Please give credit to this dataset if you download it.
This dataset was created by Lilit Janjughazyan
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains data of Quikr Cars about second hand cars . This data was web scrapped from their website and have data of about 1000 cars and have features like -Name -Company -Quikr Label (Platinum / Gold) -Location -Price -Kms driven -Fuel type
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Car Prices Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sidharth178/car-prices-dataset on 29 August 2021.
--- Dataset description provided by original source is as follows ---
With the rise in the variety of cars with differentiated capabilities and features such as model, production year, category, brand, fuel type, engine volume, mileage, cylinders, colour, airbags and many more, we are bringing a car price prediction challenge for all. We all aspire to own a car within budget with the best features available. To solve the price problem we have created a dataset of 19237 for the training dataset and 8245 for the test dataset.
Train.csv - 19237 rows x 18 columns (Includes Price Columns as Target) - Attributes - ID - Price: price of the care(Target Column) - Levy - Manufacturer - Model - Prod. year - Category - Leather interior - Fuel type - Engine volume - Mileage - Cylinders - Gear box type - Drive wheels - Doors - Wheel - Color - Airbags Test.csv - 8245 rows x 17 columns
--- Original source retains full ownership of the source dataset ---
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Actual dataset Location on Kaggle Contains data scrapped by MyAutoScrapper (Written in go)
Since this kaggle dataset real car deals, placed by real humans with pictures, It can be used for real world Machine Learning(ML) or Machine Vision. Price predictions, image processing, machine vision etc.
This dataset contains data.csv file, which has 100 000 car deal detail. Each row representing each deal. data.csv has 18 columns: - ID: Represents unique identifier for each entry, also for each id, there is a sub-folder in images respectively, which contains images for the given deal. ID is an integer starting from 0. - Manufacturer: A string identifying car manufacturer. - Model: A string identifying car model. - Year: An Integer for the car production year. - Category: A type of the vechile (Sedan, Cabriolet, etc.). - Mileage: An integer representing car mileage in kilometers. - FuelType: A Fuel type the car uses. - EngineVolume: A Floating point number, representing engine volume in litres. - DriveWheels: A String representing car drive wheels (i.e. Front, Rear, 4x4, etc.). - GearBox: A string to identify gear box of the transmission (Manual, Automatic, etc.) - Doors: A string representing car doors (4, 4/5, etc.) - Wheel: Steering wheel position (Left Wheel, Right Wheel) - Color: Color of the car body. - InteriorColor: Interior color. - VIN: VIN number of the vechile, represented as a string. - LeatherInterior: A boolean value, true if car has a leather interior. - Price: Price of the car in USD. If ommited, meants price was set as negotiable. - Clearance: A boolean value identifying, whether customs has been cleared of not.
None of the fields (Except ID) are guaranteed to be filled, or filled with correct information. Since, people sometimes don't enter correct information, or hide some information for reasons. But for most of the entries, most of the fields are supposed to be filled with correct information.
This dataset shows the Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs) that are currently registered through Washington State Department of Licensing (DOL).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘car_sales.csv’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/smritisingh1997/car-salescsv on 14 February 2022.
--- Dataset description provided by original source is as follows ---
This data contains data related to Car Sales
The data is required for the basic Linear Regression model. It can be used to explore all the basic Linear Regression assumptions, which are required if one wants to apply Linear Regression on the given data
We wouldn't be here without the help of others. I would especially like to thanks @Udemy, @Coursera, and @KhanAcademy
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
i.e.
The Car Allowance Rebate System (CARS), otherwise known as Cash for Clunkers, was a program intended to provide economic incentives to United States residents to purchase a new and more fuel efficient vehicle when trading in a less full efficient vehicle. The program was promoted as providing stimulus to the economy by boosting auto sales, while putting safer, cleaner and more fuel efficient vehicles on the road.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Application and use cases
1 )Market Analysis: Evaluate overall trends and regional variations in car sales to assess manufacturer performance, model preferences, and demographic insights. 2) Seasonal Patterns and Competitor Analysis: Investigate seasonal and cyclical patterns in sales. 3) Forecasting and Predictive Analysis Use historical data for forecasting and predict future market trends. Support marketing, advertising, and investment decisions based on insights. 4) Supply Chain and Inventory Optimization: Provide valuable data for stakeholders in the automotive industry.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is outcome of a paper "Floating Car Data Map-matching Utilizing the Dijkstra Algorithm" accepted for 3rd International Conference on Data Management, Analytics & Innovation held in Kuala Lumpur, Malaysia in 2019.
The floating car data (FCD representing movement of cars with their position in time) is produced by the traffic simulator software (further referred to as Simulator) published in [1] and can be used as an input for data processing and benchmarking. The dataset contains FCD of various quality levels based on the routing graph of the Czech Republic derived from Open Street Map openstreetmap.org.
Should the dataset be exploited in scientific or other way, any acknowledgement or references to our paper [1] and dataset are welcomed and highly appreciated.
Archive contents
The archive contains following folders.
city_oneway and city_roadtrip - FCD from the city of Brno, Czech Republic where FCD is based on Origin-Destination in case of oneway and Origin-Destination-Origin in case of a road trip
intercity_oneway and intercity_roadtrip - FCD from cities of Brno, Ostrava, Olomouc and Zlin, all Czech Republic where FCD is based on Origin-Destination in case of oneway and Origin-Destination-Origin in case of a road trip
Content explanation
All four of mentioned folders contain raw FCD as they come from our Simulator, post-processed FCD enriching Simulator FCD, and obfuscated raw FCD (of both low and high obfuscation level). In the both obfuscated data sets, each measured point was moved in a random direction a number of meters given by drawing a number from a Gaussian distribution. We utilized two Gaussian distributions, one for the roads outside the city (N(0,10) for the lower and N(0,20) for the higher obfuscation level) and one for the roads inside the city (N(0,15) and N(0,30) respectively). Then some predefined number of randomly chosen points were removed (3% in our case). This approach should roughly represent real conditions encountered by FCD data as described by El Abbous and Samanta [2].
In case of post-processed road trip data, there is one extra dataset with "cache" suffix representing the very same dataset limited to a 5-minute session memoization. This folder also contains a picture of processed FCD represented on a map.
Data format Standard UTF-8 encoded CSV files, separated by a semicolon with the following columns:
RAW
Header
session_id;timestamp;lat;lon;speed;bearing;segment_id
Data
session_id: (Type: unsigned INT) - session (car) identifier timestamp: (Type: datetime) - timestamp in UTC lat: (Type: unsigned long) - latitude as used in Google maps lon: (Type: unsigned long) - longitude as used in Google maps speed: (Type: unsigned INT) - actual speed in kmh bearing: (Type: unsigned INT) - actual bearing in angles 0-360 segment_id: (Type: unsigned long) - unique edge identifier
POST-PROCESSED
Header
gid;car_id;point_time;lat;lon;segment_id;speed_kmh;speed_avg_kmh;distance_delta_m;distance_total_m;speedup_ratio;duration;segment_changed;duration_segment;moved;duration_move;good;duration_good;bearing;interpolated
Data
gid: (Type: unsigned long) - global identifier of a record car_id: (Type: unsigned INT) - session (car) identifier point_time: (Type: datetime) - timestamp with timezone lat: (Type: unsigned long) - latitude as used in Google maps lon: (Type: unsigned long) - longitude as used in Google maps segment_id: (Type: unsigned long) - unique edge identifier speed: (Type: unsigned INT) - actual speed in kmh speed_avg_kmh: (Type: unsigned long) - actual average speed of a car in kmh distance_delta_m: (Type: unsigned long) - actual distance delta in metres distance_total_m: (Type: unsigned long) - actual total distance of a car in metres speedup_ratio: (Type: unsigned long) - actual speed-up ratio of a car duration: (Type: time) - actual duration of a car segment_changed: (Type: boolean) - signals if actual segment of a car differs from the previous one duration_segment: (Type: time) - actual duration on a segment of a car moved: (Type: boolean) - signals if actual position of a car differs from the previous one duration_move:(Type: time) - actual duration of a car since moving good: signals if actual record values satisfies all data constraints (all true as derived from Simulator) duration_good: actual duration of a car since when all constraints conditions satisfied bearing: (Type: unsigned INT) - actual bearing in angles 0-360 interpolated: (Type: boolean) - signals if actual segment identifier is calculated (all false as derived from Simulator)
References
[1] V. Ptošek, J. Ševčík, J. Martinovič, K. Slaninová, L. Rapant, and R. Cmar, Real-time traffic simulator for self-adaptive navigation system validation, Proceedings of EMSS-HMS: Modeling & Simulation in Logistics, Traffic & Transportation, 2018.
[2] A. El Abbous and N. Samanta. A modeling of GPS error distri-butions, In proceedings of 2017 European Navigation Conference (ENC), 2017.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 2674 intermittent monthly time series that represent car parts sales from January 1998 to March 2002. It was extracted from R expsmooth package.
The original dataset contains missing values and they have been replaced by zeros.
With the rise in the variety of cars with differentiated capabilities and features such as model, production year, category, brand, fuel type, engine volume, mileage, cylinders, colour, airbags and many more, we are bringing a car price prediction challenge for all. We all aspire to own a car within budget with the best features available. To solve the price problem we have created a dataset of 19237 for the training dataset and 8245 for the test dataset.
Train.csv - 19237 rows x 18 columns (Includes Price Columns as Target) - Attributes - ID - Price: price of the care(Target Column) - Levy - Manufacturer - Model - Prod. year - Category - Leather interior - Fuel type - Engine volume - Mileage - Cylinders - Gear box type - Drive wheels - Doors - Wheel - Color - Airbags Test.csv - 8245 rows x 17 columns
https://brightdata.com/licensehttps://brightdata.com/license
Our automotive datasets provide comprehensive insights into the global vehicle market, covering a wide range of data points related to car listings, pricing trends, vehicle specifications, and market demand. These datasets are ideal for businesses, analysts, and developers looking to enhance automotive research, optimize pricing strategies, or improve vehicle inventory management.
Key Features:
Vehicle Listings & Specifications: Access detailed information on cars, trucks, SUVs, motorcycles, and electric vehicles,
including make, model, year, trim, mileage, fuel type, and transmission.
Pricing & Market Trends: Analyze historical and real-time pricing data to track market fluctuations, assess vehicle depreciation,
and optimize pricing strategies.
Dealer & Private Seller Insights: Gain visibility into vehicle listings from dealerships and private sellers, including contact details,
location, and availability.
Vehicle Condition & Features: Identify key attributes such as accident history, service records, safety features, and additional specifications.
Regional & Global Coverage: Access datasets segmented by country, state, or city to analyze local and international automotive markets.
Use Cases:
Market Research & Competitive Analysis: Monitor automotive trends, track competitor pricing, and assess consumer demand.
Pricing Optimization: Adjust vehicle pricing based on real-time market data to maximize profitability and sales efficiency.
Inventory & Fleet Management: Improve vehicle sourcing, inventory tracking, and fleet management for dealerships and rental companies.
Automotive AI & Machine Learning: Train predictive models for vehicle valuation, demand forecasting, and fraud detection.
Consumer Insights & Lead Generation: Identify potential buyers, analyze purchasing behavior, and enhance targeted marketing efforts.
Our automotive datasets are available in multiple formats (JSON, CSV, Excel) and can be delivered via
API, cloud storage (AWS, Google Cloud, Azure), or direct download.
Gain valuable insights into the automotive industry with high-quality, structured data tailored to your needs.
SpaceKnow uses satellite (SAR) data to capture activity in electric vehicles and automotive factories.
Data is updated daily, has an average lag of 4-6 days, and history back to 2017.
The insights provide you with level and change data that monitors the area which is covered with assembled light vehicles in square meters.
We offer 3 delivery options: CSV, API, and Insights Dashboard
Available companies Rivian (NASDAQ: RIVN) for employee parking, logistics, logistic centers, product distribution & product in the US. (See use-case write up on page 4) TESLA (NASDAQ: TSLA) indices for product, logistics & employee parking for Fremont, Nevada, Shanghai, Texas, Berlin, and Global level Lucid Motors (NASDAQ: LCID) for employee parking, logistics & product in US
Why get SpaceKnow's EV datasets?
Monitor the company’s business activity: Near-real-time insights into the business activities of Rivian allow users to better understand and anticipate the company’s performance.
Assess Risk: Use satellite activity data to assess the risks associated with investing in the company.
Types of Indices Available Continuous Feed Index (CFI) is a daily aggregation of the area of metallic objects in square meters. There are two types of CFI indices. The first one is CFI-R which gives you level data, so it shows how many square meters are covered by metallic objects (for example assembled cars). The second one is CFI-S which gives you change data, so it shows you how many square meters have changed within the locations between two consecutive satellite images.
How to interpret the data SpaceKnow indices can be compared with the related economic indicators or KPIs. If the economic indicator is in monthly terms, perform a 30-day rolling sum and pick the last day of the month to compare with the economic indicator. Each data point will reflect approximately the sum of the month. If the economic indicator is in quarterly terms, perform a 90-day rolling sum and pick the last day of the 90-day to compare with the economic indicator. Each data point will reflect approximately the sum of the quarter.
Product index This index monitors the area covered by manufactured cars. The larger the area covered by the assembled cars, the larger and faster the production of a particular facility. The index rises as production increases.
Product distribution index This index monitors the area covered by assembled cars that are ready for distribution. The index covers locations in the Rivian factory. The distribution is done via trucks and trains.
Employee parking index Like the previous index, this one indicates the area covered by cars, but those that belong to factory employees. This index is a good indicator of factory construction, closures, and capacity utilization. The index rises as more employees work in the factory.
Logistics index The index monitors the movement of materials supply trucks in particular car factories.
Logistics Centers index The index monitors the movement of supply trucks in warehouses.
Where the data comes from: SpaceKnow brings you information advantages by applying machine learning and AI algorithms to synthetic aperture radar and optical satellite imagery. The company’s infrastructure searches and downloads new imagery every day, and the computations of the data take place within less than 24 hours.
In contrast to traditional economic data, which are released in monthly and quarterly terms, SpaceKnow data is high-frequency and available daily. It is possible to observe the latest movements in the EV industry with just a 4-6 day lag, on average.
The EV data help you to estimate the performance of the EV sector and the business activity of the selected companies.
The backbone of SpaceKnow’s high-quality data is the locations from which data is extracted. All locations are thoroughly researched and validated by an in-house team of annotators and data analysts.
Each individual location is precisely defined so that the resulting data does not contain noise such as surrounding traffic or changing vegetation with the season.
We use radar imagery and our own algorithms, so the final indices are not devalued by weather conditions such as rain or heavy clouds.
→ Reach out to get a free trial
Use Case - Rivian:
SpaceKnow uses the quarterly production and delivery data of Rivian as a benchmark. Rivian targeted to produce 25,000 cars in 2022. To achieve this target, the company had to increase production by 45% by producing 10,683 cars in Q4. However the production was 10,020 and the target was slightly missed by reaching total production of 24,337 cars for FY22.
SpaceKnow indices help us to observe the company’s operations, and we are able to monitor if the company is set to meet its forecasts or not. We deliver five different indices for Rivian, and these indices observe logistic centers, employee parking lot, logistics, product, and prod...
The controller area network (CAN) bus has emerged as the de facto standard for in-vehicle networks (IVNs) around the globe. Safety-critical components (e.g., the brakes, the engine, the transmission) depend on the CAN bus for expedient, reliable communication. Unfortunately, while the CAN bus was designed to be resilient under harsh operating conditions, it was not designed to be resilient under adversarial conditions. Standard security practices such as authentication, authorization, and encryption are completely lacking when it comes to the CAN bus. Researchers have since developed authentication, authorization, and encryption specifications for the CAN bus, but retroactive implementation of said security controls would be exorbitantly expensive—in terms of hardware, labor, engineering effort, and monetary cost. Therefore, the automotive intrusion detection system (IDS) has emerged in the literature as a low-cost, low-effort solution to the automotive [in]security problem. However, developing and evaluating an automotive IDS can be quite challenging—especially if researchers lack access to a test vehicle. Without a test vehicle, researchers are limited to publicly available CAN data, and existing CAN intrusion detection datasets come with various limitations. This lack of CAN data has become a barrier to entry into automotive intrusion detection research—and even automotive security research in general.
We seek to lower this barrier to entry by introducing a new CAN intrusion detection dataset, which facilitates the development and evaluation of automotive IDSs. Our dataset, can-train-and-test, offers real-world CAN traffic data from four different vehicles—a sedan, a compact SUV, a full-size SUV, and a pickup truck—produced by two different manufacturers. For each vehicle, we provided comparable attack captures, which enable researchers to assess a given IDS's ability to generalize to different vehicle types and models. Our dataset contains .log files for playback as well as labeled and unlabeled .csv files for supervised and unsupervised machine learning. As such, our dataset is well suited to a variety of different automotive intrusion detection and automotive security enterprises. In addition, can-train-and-test supplies nine unique attacks, ranging from denial of service (DoS) fuzzing to triple spoofing attacks. As such, researchers can select from a wide variety of attacks when partitioning the data into training and testing datasets. Alternatively, researchers can leverage our curated can-train-and-test repository, which is subdivided into four train/test sub-datasets and four testing subsets. As a benchmark, we pitted 18 machine learning models against the can-train-and-test repository. During our evaluation and analysis, we found that the multi-layer perceptron, gradient boosting, isolation forest, BIRCH, and logistic regression models consistently scored above 0.95 when it came to accuracy, precision, recall, and F1-score—regardless of the sub-dataset and testing subset. Across all experiments on all sub-datasets, we saw an average F1-score of ≈0.5039, indicating that our can-train-and-test dataset is indeed capable of distinguishing capable, well-trained IDSs from their less-than-capable counterparts. We present can-train-and-test as a contribution to the existing collection of open-access CAN intrusion detection datasets in hopes of filling in the gaps left by the existing collection.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data is formatted as a spreadsheet, encompassing the primary activities over a span of three full years (November 2015 to December 2018) concerning non-life motor insurance portfolio. This dataset comprises 105,555 rows and 30 columns. Each row signifies a policy transaction, while each column represents a distinct variable.
Dataset Card for Car Reviews
As the title suggests this dataset contains car reviews of kia cars.
Dataset Details
The dataset contains three files train_data.csv, test_features.csv and test_data.csv.
Dataset Description
The train_data.csv contains 201 data points with two columns Review and Target, test_features.csv contain 44 data points which has only Review and test_data.csv contains both Review and Target columns.
Shared by [optional]: [Taraka Ram… See the full description on the dataset page: https://huggingface.co/datasets/Tarak2134/Car_Reviews.
**This data set was last updated 3:30 PM ET Monday, January 4, 2021. The last date of data in this dataset is December 31, 2020. **
Data shows that mobility declined nationally since states and localities began shelter-in-place strategies to stem the spread of COVID-19. The numbers began climbing as more people ventured out and traveled further from their homes, but in parallel with the rise of COVID-19 cases in July, travel declined again.
This distribution contains county level data for vehicle miles traveled (VMT) from StreetLight Data, Inc, updated three times a week. This data offers a detailed look at estimates of how much people are moving around in each county.
Data available has a two day lag - the most recent data is from two days prior to the update date. Going forward, this dataset will be updated by AP at 3:30pm ET on Monday, Wednesday and Friday each week.
This data has been made available to members of AP’s Data Distribution Program. To inquire about access for your organization - publishers, researchers, corporations, etc. - please click Request Access in the upper right corner of the page or email kromano@ap.org. Be sure to include your contact information and use case.
01_vmt_nation.csv - Data summarized to provide a nationwide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.
02_vmt_state.csv - Data summarized to provide a statewide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.
03_vmt_county.csv - Data providing a county level look at vehicle miles traveled. Includes VMT estimate, percent change compared to January and seven day rolling averages to smooth out the trend lines over time.
* Filter for specific state - filters 02_vmt_state.csv
daily data for specific state.
* Filter counties by state - filters 03_vmt_county.csv
daily data for counties in specific state.
* Filter for specific county - filters 03_vmt_county.csv
daily data for specific county.
The AP has designed an interactive map to show percent change in vehicle miles traveled by county since each counties lowest point during the pandemic:
@(https://interactives.ap.org/vmt-map/)
This data can help put your county's mobility in context with your state and over time. The data set contains different measures of change - daily comparisons and seven day rolling averages. The rolling average allows for a smoother trend line for comparison across counties and states. To get the full picture, there are also two available baselines - vehicle miles traveled in January 2020 (pre-pandemic) and vehicle miles traveled at each geography's low point during the pandemic.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Source/Credit: Michael Grogan https://github.com/MGCodesandStats https://github.com/MGCodesandStats/datasets/blob/master/cars.csv
Sample dataset for regression analysis. Given 5 attributes (age, gender, miles driven per day, debt, and income) predict how much someone will spend on purchasing a car. All 5 of the input attributes have been scaled to be in 0 to 1 range. Training set has 723 training examples. Test set has 242 test examples.
This dataset will be used in an upcoming Galaxy Training Network tutorial (https://training.galaxyproject.org/training-material/topics/statistics/) on use of feedforward neural networks for regression analysis.
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
This dataset consists of various types of cars. The dataset is organized into 2 folders (train, test) and contains subfolders for each car category. There are 4,165 images (JPG) and 7 classes of cars.
Please give credit to this dataset if you download it.