98 datasets found
  1. Formula 1 Races between 2000-2025

    • kaggle.com
    Updated Feb 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    emanuel Informatica (2025). Formula 1 Races between 2000-2025 [Dataset]. https://www.kaggle.com/datasets/emanuelinformatica/formula-1-races-between-2020-2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 1, 2025
    Dataset provided by
    Kaggle
    Authors
    emanuel Informatica
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description: F1 Races 2000-2024

    Overview

    This dataset consolidates data from Formula 1 races between 2000 and 2024, designed to facilitate predictive modeling and analytical tasks related to race outcomes. The dataset integrates information from multiple reliable sources, including the Ergast API, VisualCrossing API, and Wikipedia, enriched through feature engineering techniques to enhance its predictive power.

    Data Sources

    • Ergast API: Provides historical race data including driver and constructor statistics, race results, qualifying times, and more.
    • VisualCrossing API: Offers detailed weather data, including precipitation information during race events.
    • Wikipedia: Supplies circuit-specific details like the number of turns and track length through automated web scraping.

    Dataset Structure

    The dataset includes comprehensive race-related attributes categorized as follows: - Race Information: Year, round, circuit ID, and weather conditions. - Driver & Constructor Details: IDs, performance metrics, historical standings, and nationality. - Race Metrics: Grid position, lap times, pit stops, status, and final positions. - Engineered Features: Derived variables such as driver and constructor podium finish percentages, average positions, weighted probabilities based on circuit characteristics, and recent performance trends.

    Key Features

    • Top 3 Finish: Binary target variable indicating if a driver finished in the top 3.
    • Weather Conditions: Binary indicator for rainy conditions during the race.
    • Performance Metrics: Historical and seasonal averages for both drivers and constructors.
    • Track Characteristics: Number of turns and track length for each circuit.

    Potential Use Cases

    • Predictive Modeling: Classify and predict podium finishes using machine learning algorithms.
    • Performance Analysis: Evaluate driver and constructor performance trends over multiple seasons.
    • Feature Engineering Practice: Apply advanced techniques to create new predictive features.

    Technical Details

    • Data Format: CSV
    • Total Records: Over 9,800 race results
    • Missing Values: Managed through strategic imputation methods, especially for rookie drivers and new race entries.
    • Class Imbalance: Addressed using SMOTE during modeling to ensure balanced predictive outcomes.

    Acknowledgments

    Special thanks to the contributors of the Ergast API, VisualCrossing API, and the Wikipedia community for providing essential data points that made this dataset possible.

  2. F1 Archive 1950-2022

    • kaggle.com
    zip
    Updated Jul 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahil Parikh (2022). F1 Archive 1950-2022 [Dataset]. https://www.kaggle.com/datasets/rprkh15/f1-race-and-qualifying-data
    Explore at:
    zip(1760769 bytes)Available download formats
    Dataset updated
    Jul 25, 2022
    Authors
    Rahil Parikh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Formula One is the highest class of international racing for open-wheel single-seater racing cars sanctioned by the Fédération Internationale de l'Automobile (FIA). Ever since its inaugural season in 1950, Formula1 has been regarded as the pinnacle of motorsport.

    Content

    This dataset contains detailed information about qualifying and race results for all the tracks over the course of multiple seasons. There is a separate directory for each season. There are 2 sub-directories for each season, namely: Qualifying Results and Race Results. The Race Results directory contains an overall_race_results.csv file which summarizes the race results throughout the entire season. It also contains multiple .csv files for the results of each race in the season. The Qualifying Results directory contains multiple .csv files for the qualifying results before the start of each race.

    Note

    For the 1982 season and before the qualifying results contain only 1 entry in the file which is that of the polesitter. The lap times of the other drivers were not accounted for, and on the official website there is only 1 entry under the qualifying results.

    Inspiration

    F1 is one of my favorite sports and I almost never miss a race 😄

    The motivation behind creating this dataset was to learn more about web scraping and try to perform a statistical analysis of the data. Some of the things you could do with the entire dataset are as follows: - Identify the driver with the most poles - Compare qualifying times of different drivers (championship contenders, team-mates, etc) - Determine how often a particular driver out-qualifies his team-mate - Compare qualifying lap times of a race from previous seasons - Identify the driver with the most number of wins at a particular track - Analyze how the championship battle unfolded based on the number of points scored by the drivers (specially interesting for the 2021 f1 season 👀) - Identify drivers with the highest number of wins, podiums, DNFs, etc - Compare the average lap times of different tracks to identify the slowest and fastest tracks on the calendar - Compare the number of laps for each race in the season (Belgium 2021 being the clear winner 😂) - Find out who won the Driver's Championship based on the total number of points - Find out who won the Constructor's Championship based on the total number of points for each team

    Some Common F1 Terms You Might Come Across

    • DNF: Did Not Finish. Commonly used nomenclature for drivers that crashed/failed to complete the entire race
    • DNQ: Did Not Qualify. Eliminated missing values from the qualifying datasets by introducing this abbreviation for drivers who failed to qualify.
    • NC: Not Confirmed. For drivers that DNF the term NC is used in the Position column
    • DQ: Disqualified. Generally drivers are disqualified from races due to technical infringements or a breach of sporting regulations (Example: Sebastian Vettel was disqualified from the 2021 Hungarian Grand Prix due to fuel irregularites and stripped of all the points he earned from finishing the race in P2)

    Future Work

    As I collect more data for the previous seasons, I will create new versions for the dataset. The goal with this dataset is to create an archive of qualifying and race data from 1950-2021. The dataset will also be updated when the 2022 season commences.

  3. Formula 1 race data

    • zenodo.org
    zip
    Updated Jul 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saravanarajan G; Saravanarajan G (2025). Formula 1 race data [Dataset]. http://doi.org/10.5281/zenodo.16420501
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Saravanarajan G; Saravanarajan G
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is a comprehensive, structured collection of historical Formula 1 race data compiled from the Ergast API. It is organized in a relational format across multiple CSV files, each capturing a different aspect of the sport. The `races.csv` file includes metadata on each race such as date, circuit, and season. The `results.csv` file provides final race outcomes for every driver, while `qualifying.csv` contains qualifying session results. `lap_times.csv` and `pit_stops.csv` offer granular, session-level data for each driver’s performance throughout the race. Additional files such as `drivers.csv` and `constructors.csv` provide biographical and team-related information, while `constructor_standings.csv`, `driver_standings.csv`, and `constructor_results.csv` track season-long performance. Files like `circuits.csv`, `status.csv`, and `seasons.csv` provide supporting metadata that enhances the usability and relational structure of the dataset. This dataset is well-suited for time series analysis, predictive modeling, performance evaluation, and motorsport analytics.

  4. Level of interest in Formula 1 in the U.S. 2025, by age

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Level of interest in Formula 1 in the U.S. 2025, by age [Dataset]. https://www.statista.com/statistics/1107573/fomula-one-series-interest-age/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 11, 2025 - Jun 17, 2025
    Area covered
    United States
    Description

    In 2025, younger adults in the United States tended to have more interest in Formula One, with ** percent of people aged between 18 and 29 following the racing series closely. Meanwhile, only ***** percent of those aged 65 and over did the same.

  5. Level of interest in Formula 1 in the U.S. 2025

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Level of interest in Formula 1 in the U.S. 2025 [Dataset]. https://www.statista.com/statistics/1107528/fomula-one-series-interest/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 11, 2025 - Jun 17, 2025
    Area covered
    United States
    Description

    In 2025, *********** adults in the United States followed Formula One to some extent. Meanwhile, ** percent of respondents said that they did not follow the racing series closely.

  6. F1 STATS

    • kaggle.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mateo Pineda Giraldo (2025). F1 STATS [Dataset]. https://www.kaggle.com/datasets/mateopinedagiraldo/f1-stats
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mateo Pineda Giraldo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Mateo Pineda Giraldo

    Released under Apache 2.0

    Contents

  7. F1 race by race (1983-2021)

    • kaggle.com
    zip
    Updated Aug 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prajwal Sood (2021). F1 race by race (1983-2021) [Dataset]. https://www.kaggle.com/datasets/prajwalsood/f1-race-by-race-19832021
    Explore at:
    zip(951173 bytes)Available download formats
    Dataset updated
    Aug 28, 2021
    Authors
    Prajwal Sood
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    F1 data, race by race, from 1983 onwards

    Content

    Each folder contains data for each race of that season

    Acknowledgements

    Collected using data available in public domain

    Inspiration

    My main goal is to come up with a predictive model for F1 races

  8. Share of F1 fans in the U.S. in 2024

    • statista.com
    Updated Aug 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of F1 fans in the U.S. in 2024 [Dataset]. https://www.statista.com/statistics/1498284/f1-fans-us/
    Explore at:
    Dataset updated
    Aug 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Sep 2024
    Area covered
    United States
    Description

    In a September 2024 survey, ** percent of respondents in the United States identified as Formula 1 fans. Meanwhile, ** percent of respondents described themselves as diehard fans.

  9. S

    Global Formula 1 Racing Market Growth Opportunities 2025-2032

    • statsndata.org
    excel, pdf
    Updated Jul 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Formula 1 Racing Market Growth Opportunities 2025-2032 [Dataset]. https://www.statsndata.org/report/formula-1-racing-market-377089
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    Jul 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Formula 1 Racing market, synonymous with high-octane excitement and cutting-edge technology, has grown into a multi-billion-dollar industry that captivates millions of fans worldwide. With a rich history dating back to 1950, Formula 1 has evolved not only as a thrilling motorsport but also as a significant busin

  10. Formula 1 total revenue 2017-2024

    • statista.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Formula 1 total revenue 2017-2024 [Dataset]. https://www.statista.com/statistics/1137226/formula-one-revenue/
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In 2024, the total revenue of the Formula One Group amounted to around **** billion U.S. dollars, representing an increase of over 13 percent on the previous year. Since 2017, the group has been owned by Liberty Media Corporation.

  11. Breakdown of avid Formula 1 fans in Great Britain 2022, by gender

    • statista.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Breakdown of avid Formula 1 fans in Great Britain 2022, by gender [Dataset]. https://www.statista.com/statistics/1235232/formula-1-interest-level-by-gender/
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 11, 2022 - Nov 30, 2022
    Area covered
    United Kingdom, Great Britain
    Description

    A November 2022 survey in Great Britain revealed that around ** percent of avid Formula One fans in the country were male. Meanwhile, ** percent of British F1 fans were female.

  12. h

    Formula1-2024-Miami-Verstappen-telemetry

    • huggingface.co
    Updated May 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Draichi (2024). Formula1-2024-Miami-Verstappen-telemetry [Dataset]. https://huggingface.co/datasets/Draichi/Formula1-2024-Miami-Verstappen-telemetry
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 5, 2024
    Authors
    Lucas Draichi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Max Verstappen's Full Race Car Data: 2024 Miami Grand Prix

      Data Source
    

    Data obtained using the fastf1 API, ensuring reliability and accuracy in the collected data.

      Metrics
    

    The dataset comprises a comprehensive range of metrics crucial for analyzing Max Verstappen's performance during the 2024 Miami Grand Prix, including:

    Data: Timestamps for each recorded data point. RPM: Engine revolutions per minute, indicating engine performance and power delivery. Speed:… See the full description on the dataset page: https://huggingface.co/datasets/Draichi/Formula1-2024-Miami-Verstappen-telemetry.

  13. Formula 1 (F1) trending tweets 🏎 🏁

    • kaggle.com
    Updated Aug 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kash (2022). Formula 1 (F1) trending tweets 🏎 🏁 [Dataset]. https://www.kaggle.com/datasets/kaushiksuresh147/formula-1-trending-tweets/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2022
    Dataset provided by
    Kaggle
    Authors
    Kash
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    http://www.100hdwallpapers.com/wallpapers/1920x1080/mercedes_f1_w05_formula_one_racing_car-hd_wallpapers.jpg" alt="">

    • Formula One (also known as Formula 1 or F1) is the highest class of international auto racing for single-seater formula racing cars sanctioned by the Fédération Internationale de l'Automobile (FIA). The World Drivers' Championship, which became the FIA Formula One World Championship in 1981, has been one of the premier forms of racing around the world since its inaugural season in 1950. The word formula in the name refers to the set of rules to which all participants' cars must conform. A Formula One season consists of a series of races, known as Grands Prix, which take place worldwide on both purpose-built circuits and closed public roads.

    • The craze for F1 among the fans is astonishing, which has been creating quite a buzz in major social media platforms like Twitter. The dataset brings you such tweets posted with the #f1 hashtag.

    Information regarding the data

    • The data totally consists of 50k+ records with 13 columns. The collection started on 25/7/2020 and will be updated regularly. The description of the features is given below.

    Inspiration

        "I am an artist, the track is my canvas and the car is my brush." – Graham Hill
    
  14. Deorbit Descent and Landing Flight 1 (DDL-F1)

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated May 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA (2025). Deorbit Descent and Landing Flight 1 (DDL-F1) [Dataset]. https://catalog.data.gov/dataset/deorbit-descent-and-landing-flight-1-ddl-f1
    Explore at:
    Dataset updated
    May 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This data was recorded during Flight 1 of the Blue Origin Deorbit, Descent, and Landing Tipping Point (BODDL-TP) Game Changing Development (GCD) Program. The flight included IMU, cameras for terrain relative navigation, and range and velocity lidar sensors. The flight was completed under NASA contract 80LARC19C0005 in October 2020.

  15. i

    Grant Giving Statistics for The F1 Key Foundation

    • instrumentl.com
    Updated Feb 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Grant Giving Statistics for The F1 Key Foundation [Dataset]. https://www.instrumentl.com/990-report/f1-key-foundation
    Explore at:
    Dataset updated
    Feb 27, 2022
    Variables measured
    Total Assets, Total Giving, Average Grant Amount
    Description

    Financial overview and grant giving statistics of The F1 Key Foundation

  16. Formula 1 attendance 2024, by circuit

    • statista.com
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Formula 1 attendance 2024, by circuit [Dataset]. https://www.statista.com/statistics/271306/formula-1-revenue-in-2009-by-sector/
    Explore at:
    Dataset updated
    Apr 22, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    In 2024, the attendance of the British Grand Prix amounted to around 480,000, making it the best-attended F1 race of that year. Meanwhile, the attendance of the Australian Grand Prix totaled over 450,000.

  17. FIA F1 (Formula 1) 1950-2020 data

    • kaggle.com
    zip
    Updated Jul 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aadil Tajani (2020). FIA F1 (Formula 1) 1950-2020 data [Dataset]. https://www.kaggle.com/aadiltajani/fia-f1-19502019-data
    Explore at:
    zip(58157 bytes)Available download formats
    Dataset updated
    Jul 5, 2020
    Authors
    Aadil Tajani
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    FIA F1 - Formula 1 is a Cutthroat motorsport competition started since 1950 and continues till date and attracts more and more fans every year towards this heritage sport.

    Content

    I have included the various datasets like Race wins, Constructors as well as Drivers Championship and Fastest Laps for years 1950-2019 and will add more and recent data shortly soon as it is available.

  18. Section F1 Milk Sales

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). Section F1 Milk Sales [Dataset]. https://catalog.data.gov/dataset/section-f1-milk-sales-5b358
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    The survey interviewed 254 retailer shops in 10 sub-cities of Addis Ababa. 30 supermarkets, 20 mini-markets, 100 regular shops, 80 dairy shops and 24 open market shops selling dairy products were interviewed. Details of the sampling strategy is found in the attachment. The survey collected information on the characteristics of the shop, details of dairy products sold, prices and quality. Policy makers, research, and other stakeholders can use this data to analyses dairy value chain in Ethiopia and dairy retailing practices in Ethiopia. This data set was collected through research of the project “Improving the evidence and policies for better performing livestock systems in Ethiopia” lead by the International Food Policy Research Institution as part of the Feed the Future Innovation Lab for Livestock Systems.

  19. R

    Posanapalle V2 F1 Ortho Data Dataset

    • universe.roboflow.com
    zip
    Updated Jan 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NMAnnotate (2024). Posanapalle V2 F1 Ortho Data Dataset [Dataset]. https://universe.roboflow.com/nmannotate/posanapalle-v2-f1-ortho-data
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    NMAnnotate
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Palm Tree Crown Polygons
    Description

    Posanapalle V2 F1 Ortho Data

    ## Overview
    
    Posanapalle V2 F1 Ortho Data is a dataset for instance segmentation tasks - it contains Palm Tree Crown annotations for 273 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
    
  20. Public interest in Formula One in the U.S. 2023, by ethnicity

    • statista.com
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Public interest in Formula One in the U.S. 2023, by ethnicity [Dataset]. https://www.statista.com/statistics/1107577/fomula-one-series-interest-ethnicity/
    Explore at:
    Dataset updated
    Jul 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 16, 2023 - May 18, 2023
    Area covered
    United States
    Description

    Formula One is a motorsport discipline sanctioned by the FIA and owned by Formula One group. In a survey conducted in ********, around ** percent of Hispanic respondents in the United States were avid fans of Formula One.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
emanuel Informatica (2025). Formula 1 Races between 2000-2025 [Dataset]. https://www.kaggle.com/datasets/emanuelinformatica/formula-1-races-between-2020-2025
Organization logo

Formula 1 Races between 2000-2025

Dataset for analyzing and predicting Formula 1 race outcomes from 2000-2024

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 1, 2025
Dataset provided by
Kaggle
Authors
emanuel Informatica
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Description: F1 Races 2000-2024

Overview

This dataset consolidates data from Formula 1 races between 2000 and 2024, designed to facilitate predictive modeling and analytical tasks related to race outcomes. The dataset integrates information from multiple reliable sources, including the Ergast API, VisualCrossing API, and Wikipedia, enriched through feature engineering techniques to enhance its predictive power.

Data Sources

  • Ergast API: Provides historical race data including driver and constructor statistics, race results, qualifying times, and more.
  • VisualCrossing API: Offers detailed weather data, including precipitation information during race events.
  • Wikipedia: Supplies circuit-specific details like the number of turns and track length through automated web scraping.

Dataset Structure

The dataset includes comprehensive race-related attributes categorized as follows: - Race Information: Year, round, circuit ID, and weather conditions. - Driver & Constructor Details: IDs, performance metrics, historical standings, and nationality. - Race Metrics: Grid position, lap times, pit stops, status, and final positions. - Engineered Features: Derived variables such as driver and constructor podium finish percentages, average positions, weighted probabilities based on circuit characteristics, and recent performance trends.

Key Features

  • Top 3 Finish: Binary target variable indicating if a driver finished in the top 3.
  • Weather Conditions: Binary indicator for rainy conditions during the race.
  • Performance Metrics: Historical and seasonal averages for both drivers and constructors.
  • Track Characteristics: Number of turns and track length for each circuit.

Potential Use Cases

  • Predictive Modeling: Classify and predict podium finishes using machine learning algorithms.
  • Performance Analysis: Evaluate driver and constructor performance trends over multiple seasons.
  • Feature Engineering Practice: Apply advanced techniques to create new predictive features.

Technical Details

  • Data Format: CSV
  • Total Records: Over 9,800 race results
  • Missing Values: Managed through strategic imputation methods, especially for rookie drivers and new race entries.
  • Class Imbalance: Addressed using SMOTE during modeling to ensure balanced predictive outcomes.

Acknowledgments

Special thanks to the contributors of the Ergast API, VisualCrossing API, and the Wikipedia community for providing essential data points that made this dataset possible.

Search
Clear search
Close search
Google apps
Main menu