Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains detailed flight performance and delay information for domestic flights in 2024, merged from monthly BTS TranStats files into a single cleaned dataset. It includes over 7 million rows and 35 columns, providing comprehensive information on scheduled and actual flight times, delays, cancellations, diversions, and distances between airports. The dataset is suitable for exploratory data analysis (EDA), machine learning tasks such as delay prediction, time series analysis, and airline/airport performance studies.
Monthly CSV files for January–December 2024 were downloaded from the BTS TranStats On-Time Performance database, and 35 relevant columns were selected. The monthly files were merged into a single dataset using pandas, with cleaning steps including standardizing column names to snake_case (e.g., flight_date, dep_delay), converting flight_date to ISO format (YYYY-MM-DD), converting cancelled and diverted to binary indicators (0/1), and filling missing values in delay-related columns (carrier_delay, weather_delay, nas_delay, security_delay, late_aircraft_delay) with 0, while preserving all other values as in the original data.
Source: Available at BTS TranStats
flight_data_2024.csv — full cleaned dataset (~7M rows, 35 columns) flight_data_2024_sample.csv — sample dataset (10,000 rows) flight_data_2024_data_dictionary.csv — column names, data types, null percentage, and example values README.md — dataset overview and usage instructions LICENSE.txt — CC0 license dataset-metadata.json — Kaggle metadata for the dataset| Column Name | Description |
|---|---|
year | Year of flight |
month | Month of flight (1–12) |
day_of_month | Day of the month |
day_of_week | Day of week (1=Monday … 7=Sunday) |
fl_date | Flight date (YYYY-MM-DD) |
op_unique_carrier | Unique carrier code |
op_carrier_fl_num | Flight number for reporting airline |
origin | Origin airport code |
origin_city_name | Origin city name |
origin_state_nm | Origin state name |
dest | Destination airport code |
dest_city_name | Destination city name |
dest_state_nm | Destination state name |
crs_dep_time | Scheduled departure time (local, hhmm) |
dep_time | Actual departure time (local, hhmm) |
dep_delay | Departure delay in minutes (negative if early) |
taxi_out | Taxi out time in minutes |
wheels_off | Wheels-off time (local, hhmm) |
wheels_on | Wheels-on time (local, hhmm) |
taxi_in | Taxi in time in minutes |
crs_arr_time | Scheduled arrival time (local, hhmm) |
arr_time | Actual arrival time (local, hhmm) |
arr_delay | Arrival delay in minutes (negative if early) |
cancelled | Cancelled flight indicator (0=No, 1=Yes) |
cancellation_code | Reason for cancellation (if cancelled) |
diverted | Diverted flight indicator (0=No, 1=Yes) |
crs_elapsed_time | Scheduled elapsed time in minutes |
actual_elapsed_time | Actual elapsed time in minutes |
air_time | Flight time in minutes |
distance | Distance between origin and destination (miles) |
carrier_delay | Carrier-related delay in minutes |
weather_delay | Weather-related delay in minutes |
nas_delay | National Air System delay in minutes |
security_delay | Security delay in minutes |
late_aircraft_delay | Late aircraft delay in minutes |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
BACKGROUND The data contained in the compressed file has been extracted from the Marketing Carrier On-Time Performance (Beginning January 2018) data table of the "On-Time" database from the TranStats data library. The time period is indicated in the name of the compressed file; for example, XXX_XXXXX_2001_1 contains data of the first month of the year 2001.
RECORD LAYOUT Below are fields in the order that they appear on the records: Year Year Quarter Quarter (1-4) Month Month DayofMonth Day of Month DayOfWeek Day of Week FlightDate Flight Date (yyyymmdd) Marketing_Airline_Network Unique Marketing Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. Operated_or_Branded_Code_Share_Partners Reporting Carrier Operated or Branded Code Share Partners DOT_ID_Marketing_Airline An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. IATA_Code_Marketing_Airline Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. Flight_Number_Marketing_Airline Flight Number Originally_Scheduled_Code_Share_Airline Unique Scheduled Operating Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users,for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. DOT_ID_Originally_Scheduled_Code_Share_Airline An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. IATA_Code_Originally_Scheduled_Code_Share_Airline Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. Flight_Num_Originally_Scheduled_Code_Share_Airline Flight Number Operating_Airline Unique Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. DOT_ID_Operating_Airline An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. IATA_Code_Operating_Airline Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. Tail_Number Tail Number Flight_Number_Operating_Airline Flight Number OriginAirportID Origin Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. OriginAirportSeqID Origin Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time. OriginCityMarketID Origin Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market. Origin Origin Airport OriginCityName Origin Airport, City Name OriginState Origin Airport, State Code OriginStateFips Origin Airport, State Fips OriginStateName Origin Airport, State Name OriginWac Origin Airport, World Area Code DestAirportID Destination Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. DestAirportSeqID Destination Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time. DestCityMarketID Destination Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market. Dest Destination Airport DestCityName Destination Airport, City Name DestState Destination Airport, State Code DestStateFips De...
Facebook
TwitterProblem Statement : According to air travel consumer reports, a large proportion of consumer complaints are about frequent flight delays. Out of all the complaints received from consumers about airline services, 32% were related to cancellations, delays, or other deviations from the airlines’ schedules. There are unavoidable delays that can be caused by air traffic, no passengers at the airport, weather conditions, mechanical issues, passengers coming from delayed connecting flights, security clearance, and aircraft preparation.
Objective : The objective of this project is to identify the factors that contribute to avoidable flight delays. You are also required to build a model to predict if the flight will be delayed.
Note : Please refer Data Dictionary file for column description of all the three files.
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
This dataset provides detailed information on flight arrivals and delays for U.S. airports, categorized by carriers. The data includes metrics such as the number of arriving flights, delays over 15 minutes, cancellation and diversion counts, and the breakdown of delays attributed to carriers, weather, NAS (National Airspace System), security, and late aircraft arrivals. Explore and analyze the performance of different carriers at various airports during this period. Use this dataset to gain insights into the factors contributing to delays in the aviation industry.
Purpose: The purpose of this dataset is to offer insights into the performance of U.S. carriers at various airports during August 2013 - August 2023, focusing on flight arrivals and delays. By providing detailed information on key metrics such as the number of arriving flights, delays over 15 minutes, cancellations, and diversions, the dataset aims to facilitate analyses of factors contributing to delays, including those attributed to carriers, weather, the National Airspace System (NAS), security, and late aircraft arrivals. Researchers, data scientists, and aviation enthusiasts can leverage this dataset to explore patterns, identify trends, and draw conclusions that contribute to a better understanding of the aviation industry's operational challenges.
Structure: The dataset is structured as a tabular format with rows representing unique combinations of year, month, carrier, and airport. Each row contains information on various metrics, including flight counts, delay counts, cancellation and diversion counts, and delay breakdowns by different factors. The columns provide specific details such as carrier codes and names, airport codes and names, and counts of delays attributed to carrier, weather, NAS, security, and late aircraft arrivals. The structured format ensures that users can easily query, analyze, and visualize the data to derive meaningful insights.
Usage: Researchers, analysts, and data enthusiasts can utilize this dataset for a variety of purposes, including but not limited to:
Performance Analysis: Assess the on-time performance of different carriers at specific airports and identify potential areas for improvement.
Trend Identification: Analyze temporal trends in delays, cancellations, and diversions to understand whether certain months or periods exhibit higher operational challenges.
Root Cause Analysis: Investigate the primary contributors to delays, such as carrier-related issues, weather conditions, NAS inefficiencies, security concerns, or late aircraft arrivals.
Benchmarking: Compare the performance of various carriers across different airports to identify industry leaders and areas requiring attention.
Predictive Modeling: Use historical data to develop predictive models for flight delays, aiding in the development of strategies to mitigate disruptions.
Industry Insights: Contribute to a broader understanding of the factors influencing operational efficiency within the U.S. aviation sector.
As users explore and analyze the dataset, they can gain valuable insights that may inform decision-making processes, improve operational strategies, and contribute to a more efficient and reliable air travel experience.
Facebook
TwitterFLIGHTS_30m_SAMPLE_2m_ALL.csv(409.41 MB)
2 million record random sample from shape(29,380,335 row x 32 column) record DOT dataset at link below.
Airline Names linked from Dictionary; Column Headers Updated
Flight Delay and Cancellation Dataset (2019-2023)
1: FL_DATE 2: AIRLINE 3: AIRLINE_DOT 4: AIRLINE_CODE 5: DOT_CODE 6: FL_NUMBER 7: ORIGIN 8: ORIGIN_CITY 9: DEST 10: DEST_CITY 11: CRS_DEP_TIME 12: DEP_TIME 13: DEP_DELAY 14: TAXI_OUT 15: WHEELS_OFF 16: WHEELS_ON 17: TAXI_IN 18: CRS_ARR_TIME 19: ARR_TIME 20: ARR_DELAY 21: CANCELLED 22: CANCELLATION_CODE 23: DIVERTED 24: CRS_ELAPSED_TIME 25: ELAPSED_TIME 26: AIR_TIME 27: DISTANCE 28: DELAY_DUE_CARRIER 29: DELAY_DUE_WEATHER 30: DELAY_DUE_NAS 31: DELAY_DUE_SECURITY 32: DELAY_DUE_LATE_AIRCRAFT
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights is published in DOT's monthly Air Travel Consumer Report and in this dataset of 2015 flight delays and cancellations.
The flight delay and cancellation data was collected and published by the DOT's Bureau of Transportation Statistics.
Facebook
TwitterWhat flights does the reporting cover? The rule requires carriers to report on domestic operations to and from U.S. airports.
A flight is considered delayed when it arrived 15 or more minutes than the schedule. Delayed minutes are calculated for delayed flights only.
When multiple causes are assigned to one delayed flight, each cause is prorated based on delayed minutes it is responsible for. The displayed numbers are rounded and may not add up to the total.
The marketing carrier networks are: Alaska Airlines (AS)* Allegiant Air (G4) American Airlines (AA)* Delta Air Lines (DL)* Frontier Airlines (F9) Hawaiian Airlines (HA)* JetBlue Airways (B6) Southwest Airlines (WN) Spirit Airlines (NK) United Airlines (UA)*
*Includes branded code-share partners
The reporting airlines are: Alaska Airlines (AS) Allegiant Air (G4) American Airlines (AA) Delta Air Lines (DL) Endeavor Air (9E) Envoy Air (MQ) Frontier Airlines (F9) Hawaiian Airlines (HA) Horizon Air (QX) JetBlue Airways (B6) Mesa Airlines (YV) PSA Airlines (OH) Republic Airlines (YX) SkyWest Airlines (OO) Southwest Airlines (WN) Spirit Airlines (NK) United Airlines (UA)
The airlines report the causes of delays in five broad categories: - Air Carrier: The cause of the cancellation or delay was due to circumstances within the airline's control (e.g. maintenance or crew problems, aircraft cleaning, baggage loading, fueling, etc.). - Extreme Weather: Significant meteorological conditions (actual or forecasted) that, in the judgment of the carrier, delays or prevents the operation of a flight such as tornado, blizzard or hurricane. - National Aviation System (NAS): Delays and cancellations attributable to the national aviation system that refer to a broad set of conditions, such as non-extreme weather conditions, airport operations, heavy traffic volume, and air traffic control. - Late-arriving aircraft: A previous flight with same aircraft arrived late, causing the present flight to depart late. - Security: Delays or cancellations caused by evacuation of a terminal or concourse, re-boarding of aircraft because of security breach, inoperative screening equipment and/or long lines in excess of 29 minutes at screening areas.
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
Data that looks at how market structure affects delays for US domestic flights between the years 2004 - 2017.
Data on airline delays come from the Airline On-Time Performance Data (OTPD) from the US Bureau of Transportation Statistics. The data on tail numbers and seat capacity come from the Federal Aircraft Administration Aircraft Registry. The data on flight-related whether comes from the Local Climatological Data (LCD) provided by the National Center for Environmental Information.
Data has 61 Feature and more than 1M record
You can find attached doc file for Meta Data
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains historical flight records, including airline names, flight numbers, departure and arrival airports, day of the week, scheduled departure time, flight duration, and delay status. It is designed for machine learning applications in predicting flight delays based on real-world factors. The dataset provides valuable insights for airline operations, passenger experience improvements, and predictive analytics. With features like departure time and flight length, it enables researchers and data scientists to develop models for delay classification and regression. This dataset is ideal for training machine learning algorithms to anticipate delays and optimize air traffic scheduling.
Facebook
Twitteryear. month. carrier : Abbreviation of carrier. carrier_name : the actual carrier name.airport : Abbreviation of airbort.airport_name : the actual airport name.ar r_flights: Number of flights arrived the airport.arr_del15 : Number of flights delayed.carrier_ct: Number of flights delayed due to air carrier.weather_ct: Number of flights delayed due to weather.nas_ct: Number of flights delayed due to National Aviation System.security_ct : Number of flights delayed due to security.lateaircraftct : Number of flights delayed due to a previous flight.arr_cancelled : Number of flight that has been cancelled.arr_diverted : Number of flight that has been diverted.arr_delay : time of delayed flights.carrier_delay : time of delayed flights due to air carrier.weather_delay : time of delayed flights due to weather.nas_delay : time of delayed flights due to National Aviation System.security_delay : time of delayed flights due to security.late_aircraft_delay: time of delayed flights due to a previous flight.
Facebook
TwitterAirline Delays for December 2019 and 2020. Description Summary Data counts for airline per carrier per US City.
Usage airline_delay Format A data frame with 3351 rows and 21 variables.
year Year data collected
month Numeric representation of the month
carrier Carrier.
carrier_name Carrier Name.
airport Airport code.
airport_name Name of airport.
arr_flights Number of flights arriving at airport
arr_del15 Number of flights more than 15 minutes late
carrier_ct Number of flights delayed due to air carrier. (e.g. no crew)
weather_ct Number of flights due to weather.
nas_ct Number of flights delayed due to National Aviation System (e.g. heavy air traffic).
security_ct Number of flights canceled due to a security breach.
late_aircraft_ct Number of flights delayed as a result of another flight on the same aircraft delayed
arr_cancelled Number of cancelled flights
arr_diverted Number of flights that were diverted
arr_delay Total time (minutes) of delayed flight.
carrier_delay Total time (minutes) of delay due to air carrier
weather_delay Total time (minutes) of delay due to inclement weather.
nas_delay Total time (minutes) of delay due to National Aviation System.
security_delay Total time (minutes) of delay as a result of a security issue .
late_aircraft_delay Total time (minutes) of delay flights as a result of a previous flight on the same airplane being late.
Facebook
TwitterThis dataset was created by AlgorithmicDeer
Facebook
TwitterUS Domestic Flights Delay: flight delays in the month of January,August, November and December of 2016
Facebook
TwitterThis project is about predicting if a flight will be delayed by over 15 minutes upon arrival, with Scikit-learn Decision Tree Classifier, using US flight data in 2022. Here is the URL of the dataset and variables description: https://www.transtats.bts.gov/DL_SelectFields.aspx?gnoyr_VQ=FGK&QO_fu146_anzr=b0-gvzr
Context The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics tracks the on-time performance of domestic flights operated by large air carriers. This dataset is collected from the Bureau of Transportation Statistics, Govt. of the USA. This data is open-sourced under U.S. Govt. Works. I dowload 12 csv file that represents each month of 2022. This dataset contains all US domestic flights in 2022.
Description of Columns • Quarter Quarter (1-4) • Month Month • DayofMonth Day of Month • DayOfWeek Day of Week • FlightDate Date of the Flight • Marketing_Airline_Network Airline Identifier • OriginCityName Origin Airport, City Name • DestCityName Destination Airport, City Name • DepDelay Difference in minutes between scheduled and actual departure time. Early departures show negative numbers • ArrDelay Difference in minutes between scheduled and actual arrival time. Early arrivals show negative numbers • Cancelled Cancelled Flight (1=Yes) • Diverted Diverted Flight (1=Yes) • AirTime Flight Time, in Minutes • Distance Distance between airports (miles) • CarrierDelay Delay caused by the airline in minutes • WeatherDelay Delay caused by weather • NASDelay Delay caused by air system • SecurityDelay Delay caused by security reasons • LateAircraftDelay Delay caused as a result of another flight on the same aircraft delayed
Facebook
TwitterThis dataset was created by Patrick Zelazko
Released under Other (specified in description)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a smaller sampled version of 2015 Flight Delays and Cancellations dataset.
Original Dataset can be found at: https://www.kaggle.com/datasets/usdot/flight-delays?datasetId=810&sortBy=voteCount&select=airlines.csv
The flight delay and cancellation data was collected and published by the DOT's Bureau of Transportation Statistics.
Facebook
TwitterThis dataset was created by Emin Hashimi
Facebook
TwitterPREDICT THE FLIGHT DELAY
Flight delays not only irritate air passengers and disrupt their schedules but also cause :
● a decrease in efficiency ● an increase in capital costs reallocation of flight crews and aircraft ● an additional crew expenses
The aim is to predict whether the given flight will be delayed or not. The accurate prediction of flight delays will help all players in the air travel ecosystem to set up effective action plans to reduce the impact of the delays and avoid loss of time, capital and resources.
We can perform exploratory data analysis using visualization to generate any insights that help business to understand what factors may cause flight delay. Optionally, We may use IATA code reference (external data source - Search in internet) for decoding the airport or flight codes for better explanation of variables in visualizations.
We also need to build a flight delay predictive model using Machine Learning techniques. You may derive new features from the existing features and also from the domain knowledge, which may help in improving the model efficiency.
** Attributes DataDescription: ** FL_DATE : Flight Date OP_UNIQUE_CARRIER : Unique Carrier Code OP_CARRIER : Code assigned by IATA and commonly used to identify a carrier. TAIL_NUM : Tail Number OP_CARRIER_FL_NUM : Flight Number ORIGIN_AIRPORT_ID : Origin Airport ID.An identification number assigned by US DOT to identify a unique airport. ORIGIN : Origin Airport DEST_AIRPORT_ID : Destination Airport ID.An identification number assigned by US DOT to identify a unique airport. DEST : Destination Airport CRS_DEP_TIME : CRS Departure Time (local time: hhmm) DEP_TIME : Actual Departure Time (local time: hhmm) TAXI_OUT : Taxi Out Time, in Minutes WHEELS_OFF : Wheels Off Time (local time: hhmm) WHEELS_ON : Wheels On Time (local time: hhmm) TAXI_IN : Taxi Out Time, in Minutes CRS_ARR_TIME : CRS Arrival Time (local time: hhmm) ARR_TIME : Actual Arrival Time(local time: hhmm) ARR_DELAY_GROUP : Arrival Group (TARGET VARIABLE) CANCELLED : Cancelled Flight Indicator (1=Yes) DISTANCE : Distance between airports (miles)
Target: ARR_DELAY_GROUP: -1 (early departure) ARR_DELAY_GROUP: 0 (ontime) ARR_DELAY_GROUP: 1 (delayed)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset makes all of these possible. Perfect for a school project, research project or resume builder.
This dataset contains all flight information including cancellation and delays by airline for dates back to January 2018.
For your convenience you can use the Combined_Flights_XXXX.csv or Combined_Flights_XXXX.parquet files to access the combined data for the entire year. These files also have filtered out columns that are mostly null in the original dataset.
The raw data including all columns by month can be found in the files named Flights_XXXX_X.csv
The data contained in the compressed file has been extracted from the Marketing Carrier On-Time Performance (Beginning January 2018) data table of the "On-Time" database from the TranStats data library. The time period is indicated in the name of the compressed file; for example, XXX_XXXXX_2001_1 contains data of the first month of the year 2001.
Below are fields in the order that they appear on the records: | Column | Description | |----------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Year | Year | | Quarter | Quarter (1-4) | | Month | Month | | DayofMonth | Day of Month | | DayOfWeek | Day of Week | | FlightDate | Flight Date (yyyymmdd) | | Marketing_Airline_Network | Unique Marketing Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. | | Operated_or_Branded_Code_Share_Partners | Reporting Carrier Operated or Branded Code Share Partners | | DOT_ID_Marketing_Airline | An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. | | IATA_Code_Marketing_Airline | Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. | | Flight_Number_Marketing_Airline ...
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
You can use this as-is or edit it a bit:
This dataset contains U.S. domestic airline delay statistics by cause over almost two decades. Each row represents a combination of:
- Year & month
- Operating carrier (e.g. Delta, SkyWest, etc.)
- Origin airport (IATA code + full airport name)
For every (carrier, airport, month) pair, the dataset includes:
- Total number of flights
- Number of delayed flights
Delay minutes by cause, typically split into:
- Carrier-related delays
- Weather delays
- NAS / air-traffic-system delays
- Security delays
- Late aircraft delays
Why this dataset is useful
- Explore which airlines and airports are most delay-prone
- Compare delay causes over time (e.g. weather vs. late aircraft)
- Build time-series models to forecast delays
- Create dashboards for aviation analytics or route planning
Possible projects
- Ranking airports by reliability
- Visualizing delay trends from 2003–2022
- Analyzing which causes dominate in different seasons
- Training ML models to predict delay severity by airport, airline, and month
The file
Airline_Delay_Cause.csvis around 42 MB and is ready for use in pandas, Polars, or any other data-analysis tool.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains detailed flight performance and delay information for domestic flights in 2024, merged from monthly BTS TranStats files into a single cleaned dataset. It includes over 7 million rows and 35 columns, providing comprehensive information on scheduled and actual flight times, delays, cancellations, diversions, and distances between airports. The dataset is suitable for exploratory data analysis (EDA), machine learning tasks such as delay prediction, time series analysis, and airline/airport performance studies.
Monthly CSV files for January–December 2024 were downloaded from the BTS TranStats On-Time Performance database, and 35 relevant columns were selected. The monthly files were merged into a single dataset using pandas, with cleaning steps including standardizing column names to snake_case (e.g., flight_date, dep_delay), converting flight_date to ISO format (YYYY-MM-DD), converting cancelled and diverted to binary indicators (0/1), and filling missing values in delay-related columns (carrier_delay, weather_delay, nas_delay, security_delay, late_aircraft_delay) with 0, while preserving all other values as in the original data.
Source: Available at BTS TranStats
flight_data_2024.csv — full cleaned dataset (~7M rows, 35 columns) flight_data_2024_sample.csv — sample dataset (10,000 rows) flight_data_2024_data_dictionary.csv — column names, data types, null percentage, and example values README.md — dataset overview and usage instructions LICENSE.txt — CC0 license dataset-metadata.json — Kaggle metadata for the dataset| Column Name | Description |
|---|---|
year | Year of flight |
month | Month of flight (1–12) |
day_of_month | Day of the month |
day_of_week | Day of week (1=Monday … 7=Sunday) |
fl_date | Flight date (YYYY-MM-DD) |
op_unique_carrier | Unique carrier code |
op_carrier_fl_num | Flight number for reporting airline |
origin | Origin airport code |
origin_city_name | Origin city name |
origin_state_nm | Origin state name |
dest | Destination airport code |
dest_city_name | Destination city name |
dest_state_nm | Destination state name |
crs_dep_time | Scheduled departure time (local, hhmm) |
dep_time | Actual departure time (local, hhmm) |
dep_delay | Departure delay in minutes (negative if early) |
taxi_out | Taxi out time in minutes |
wheels_off | Wheels-off time (local, hhmm) |
wheels_on | Wheels-on time (local, hhmm) |
taxi_in | Taxi in time in minutes |
crs_arr_time | Scheduled arrival time (local, hhmm) |
arr_time | Actual arrival time (local, hhmm) |
arr_delay | Arrival delay in minutes (negative if early) |
cancelled | Cancelled flight indicator (0=No, 1=Yes) |
cancellation_code | Reason for cancellation (if cancelled) |
diverted | Diverted flight indicator (0=No, 1=Yes) |
crs_elapsed_time | Scheduled elapsed time in minutes |
actual_elapsed_time | Actual elapsed time in minutes |
air_time | Flight time in minutes |
distance | Distance between origin and destination (miles) |
carrier_delay | Carrier-related delay in minutes |
weather_delay | Weather-related delay in minutes |
nas_delay | National Air System delay in minutes |
security_delay | Security delay in minutes |
late_aircraft_delay | Late aircraft delay in minutes |