Facebook
TwitterAbout Dataset This dataset is designed for Formula 1 (F1) enthusiasts, researchers, and data scientists aiming to analyze performance metrics of F1 drivers and cars. It includes telemetry data collected and preprocessed from FastF1 and Ergast APIs, offering valuable insights into driver performance across various races and scenarios.
The dataset is structured to facilitate dynamic updates and high-quality visualizations, making it suitable for advanced analyses such as race strategy optimization, performance comparisons, and machine learning applications. Key metrics include lap times, sector splits, RPM, throttle, speed, and more, enabling users to identify critical performance bottlenecks and trends.
Features High-Resolution Telemetry Data: Every tenth of a second for temporal data and lap-based aggregates. Sector and Mini-Sector Analysis: Performance indices calculated for each driver in every circuit mini-sector. Driver and Race Comparisons: Facilitate multi-driver evaluations with ready-to-use datasets. Dynamic Database: Updates automatically with preprocessed data after each race. Use Cases Build advanced visualizations to understand F1 performance dynamics. Apply machine learning models to predict driver or team success. Study the impact of car setups and driving styles on race outcomes. Structure Driver Data: Includes detailed lap times, RPM, and braking data. Race Information: Metadata about circuits, weather, and track conditions. Preprocessed Tables: Optimized for quick access and analysis. Intended Audience F1 fans exploring race telemetry. Researchers in motorsports analytics. Developers building visualization or simulation tools.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Formula One is the highest class of international racing for open-wheel single-seater racing cars sanctioned by the Fédération Internationale de l'Automobile (FIA). Ever since its inaugural season in 1950, Formula1 has been regarded as the pinnacle of motorsport.
This dataset contains detailed information about qualifying and race results for all the tracks over the course of multiple seasons. There is a separate directory for each season. There are 2 sub-directories for each season, namely: Qualifying Results and Race Results. The Race Results directory contains an overall_race_results.csv file which summarizes the race results throughout the entire season. It also contains multiple .csv files for the results of each race in the season. The Qualifying Results directory contains multiple .csv files for the qualifying results before the start of each race.
For the 1982 season and before the qualifying results contain only 1 entry in the file which is that of the polesitter. The lap times of the other drivers were not accounted for, and on the official website there is only 1 entry under the qualifying results.
F1 is one of my favorite sports and I almost never miss a race 😄
The motivation behind creating this dataset was to learn more about web scraping and try to perform a statistical analysis of the data. Some of the things you could do with the entire dataset are as follows: - Identify the driver with the most poles - Compare qualifying times of different drivers (championship contenders, team-mates, etc) - Determine how often a particular driver out-qualifies his team-mate - Compare qualifying lap times of a race from previous seasons - Identify the driver with the most number of wins at a particular track - Analyze how the championship battle unfolded based on the number of points scored by the drivers (specially interesting for the 2021 f1 season 👀) - Identify drivers with the highest number of wins, podiums, DNFs, etc - Compare the average lap times of different tracks to identify the slowest and fastest tracks on the calendar - Compare the number of laps for each race in the season (Belgium 2021 being the clear winner 😂) - Find out who won the Driver's Championship based on the total number of points - Find out who won the Constructor's Championship based on the total number of points for each team
DNF: Did Not Finish. Commonly used nomenclature for drivers that crashed/failed to complete the entire raceDNQ: Did Not Qualify. Eliminated missing values from the qualifying datasets by introducing this abbreviation for drivers who failed to qualify.NC: Not Confirmed. For drivers that DNF the term NC is used in the Position columnDQ: Disqualified. Generally drivers are disqualified from races due to technical infringements or a breach of sporting regulations (Example: Sebastian Vettel was disqualified from the 2021 Hungarian Grand Prix due to fuel irregularites and stripped of all the points he earned from finishing the race in P2)As I collect more data for the previous seasons, I will create new versions for the dataset. The goal with this dataset is to create an archive of qualifying and race data from 1950-2021. The dataset will also be updated when the 2022 season commences.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains Formula 1 qualifying session data (2014–2024). Key points:
Scope & Source
Rows & Size
Columns & Meaning
id: Auto-increment primary key (unique for each row).season (int): Year of the championship (e.g., 2014).round (int): Round number within that season (e.g., 1 = Australian GP).driver (string): Driver identifier (e.g., “HAM” for Hamilton).q1/q2/q3 (float or Null, seconds): Best lap time in each qualifying segment, in seconds.team (string): Constructor/team name during that season’s qualifying.Missing Values
Basic Statistics (2014–2024)
Time ranges:
Useful for trend analysis across circuits, tire/track evolution, driver performance.
Use Cases
ML Modeling:
Race Grid Simulation: Reconstruct starting grids; integrate into broader race-prediction pipelines.
How to Load
qualifying.db):import pandas as pd
import sqlite3
conn = sqlite3.connect("qualifying.db")
df = pd.read_sql("SELECT * FROM qualifying", conn)
print(df.head())
Dataset Structure Example
id season round driver q1 q2 q3 team
0 1 2014 1 HAM 91.699 102.890 104.231 Mercedes
1 2 2014 1 RIC 90.775 102.295 104.548 Red Bull
2 3 2014 1 ROS 92.564 102.264 104.595 Mercedes
3 4 2014 1 MAG 90.949 103.247 105.745 McLaren
4 5 2014 1 ALO 91.388 102.805 105.819 Ferrari
Licensing & Attribution
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Formula 1 race data from the years 1950 to 2017. This data set is based on Formula 1 Race Data by ChrisG. As ChrisG indicated, the data was downloaded from http://ergast.com/mrd/ at the conclusion of the 2017 season.
We have simply converted the data from the original CSV files to an SQLite database, to enable queries with SQL.
We have provided two SQLite files: - Formula1.sqlite: the entire database, with 13 tables (listed below) - Formula1_4tables.sqlite: featuring four tables: races, drivers, circuits, and results
A great data set for practicing SQL queries and proceeding to data preparation and EDA.
Facebook
TwitterWelcome to the Formula 1 dataset!
This page includes a Formula 1 dataset that has been scraped from two main sources. The first source is the Formula 1 website, and the second source is data.world. Both sources are highly reliable, and the data from these sites have been used as part of Kaggle competitions in the past. Four csv files named circuits, constructors, drivers and driverGrid are available on this page. The data can be used for a range of beneficial outcomes, such as exploratory data analysis, back-end database building or for creating an application.
I wish you all the best in your learning journey!
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAbout Dataset This dataset is designed for Formula 1 (F1) enthusiasts, researchers, and data scientists aiming to analyze performance metrics of F1 drivers and cars. It includes telemetry data collected and preprocessed from FastF1 and Ergast APIs, offering valuable insights into driver performance across various races and scenarios.
The dataset is structured to facilitate dynamic updates and high-quality visualizations, making it suitable for advanced analyses such as race strategy optimization, performance comparisons, and machine learning applications. Key metrics include lap times, sector splits, RPM, throttle, speed, and more, enabling users to identify critical performance bottlenecks and trends.
Features High-Resolution Telemetry Data: Every tenth of a second for temporal data and lap-based aggregates. Sector and Mini-Sector Analysis: Performance indices calculated for each driver in every circuit mini-sector. Driver and Race Comparisons: Facilitate multi-driver evaluations with ready-to-use datasets. Dynamic Database: Updates automatically with preprocessed data after each race. Use Cases Build advanced visualizations to understand F1 performance dynamics. Apply machine learning models to predict driver or team success. Study the impact of car setups and driving styles on race outcomes. Structure Driver Data: Includes detailed lap times, RPM, and braking data. Race Information: Metadata about circuits, weather, and track conditions. Preprocessed Tables: Optimized for quick access and analysis. Intended Audience F1 fans exploring race telemetry. Researchers in motorsports analytics. Developers building visualization or simulation tools.