https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Airline data holds immense importance as it offers insights into the functioning and efficiency of the aviation industry. It provides valuable information about flight routes, schedules, passenger demographics, and preferences, which airlines can leverage to optimize their operations and enhance customer experiences. By analyzing data on delays, cancellations, and on-time performance, airlines can identify trends and implement strategies to improve punctuality and mitigate disruptions. Moreover, regulatory bodies and policymakers rely on this data to ensure safety standards, enforce regulations, and make informed decisions regarding aviation policies. Researchers and analysts use airline data to study market trends, assess environmental impacts, and develop strategies for sustainable growth within the industry. In essence, airline data serves as a foundation for informed decision-making, operational efficiency, and the overall advancement of the aviation sector.
This dataset comprises diverse parameters relating to airline operations on a global scale. The dataset prominently incorporates fields such as Passenger ID, First Name, Last Name, Gender, Age, Nationality, Airport Name, Airport Country Code, Country Name, Airport Continent, Continents, Departure Date, Arrival Airport, Pilot Name, and Flight Status. These columns collectively provide comprehensive insights into passenger demographics, travel details, flight routes, crew information, and flight statuses. Researchers and industry experts can leverage this dataset to analyze trends in passenger behavior, optimize travel experiences, evaluate pilot performance, and enhance overall flight operations.
https://i.imgur.com/cUFuMeU.png" alt="">
The dataset provided here is a simulated example and was generated using the online platform found at Mockaroo. This web-based tool offers a service that enables the creation of customizable Synthetic datasets that closely resemble real data. It is primarily intended for use by developers, testers, and data experts who require sample data for a range of uses, including testing databases, filling applications with demonstration data, and crafting lifelike illustrations for presentations and tutorials. To explore further details, you can visit their website.
Cover Photo by: Kevin Woblick on Unsplash
Thumbnail by: Airplane icons created by Freepik - Flaticon
The number of flights performed globally by the airline industry has increased steadily since the early 2000s and reached **** million in 2019. However, due to the coronavirus pandemic, the number of flights dropped to **** million in 2020. The flight volume increased again in the following years and was forecasted to reach ** million in 2025.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides detailed information on airline flight routes, fares, and passenger volumes within the United States from 1993 to 2024. The data includes metrics such as the origin and destination cities, distances between airports, the number of passengers, and fare information segmented by different airline carriers. It serves as a comprehensive resource for analyzing trends in air travel, pricing, and carrier competition over a span of three decades.
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
This record is a global open-source passenger air traffic dataset primarily dedicated to the research community. It gives a seating capacity available on each origin-destination route for a given year, 2019, and the associated aircraft and airline when this information is available. Context on the original work is given in the related article (https://journals.open.tudelft.nl/joas/article/download/7201/5683) and on the associated GitHub page (https://github.com/AeroMAPS/AeroSCOPE/).A simple data exploration interface will be available at www.aeromaps.eu/aeroscope.The dataset was created by aggregating various available open-source databases with limited geographical coverage. It was then completed using a route database created by parsing Wikipedia and Wikidata, on which the traffic volume was estimated using a machine learning algorithm (XGBoost) trained using traffic and socio-economical data. 1- DISCLAIMER The dataset was gathered to allow highly aggregated analyses of the air traffic, at the continental or country levels. At the route level, the accuracy is limited as mentioned in the associated article and improper usage could lead to erroneous analyses. Although all sources used are open to everyone, the Eurocontrol database is only freely available to academic researchers. It is used in this dataset in a very aggregated way and under several levels of abstraction. As a result, it is not distributed in its original format as specified in the contract of use. As a general rule, we decline any responsibility for any use that is contrary to the terms and conditions of the various sources that are used. In case of commercial use of the database, please contact us in advance. 2- DESCRIPTION Each data entry represents an (Origin-Destination-Operator-Aircraft type) tuple. Please refer to the support article for more details (see above). The dataset contains the following columns:
"First column" : index airline_iata : IATA code of the operator in nominal cases. An ICAO -> IATA code conversion was performed for some sources, and the ICAO code was kept if no match was found. acft_icao : ICAO code of the aircraft type acft_class : Aircraft class identifier, own classification.
WB: Wide Body NB: Narrow Body RJ: Regional Jet PJ: Private Jet TP: Turbo Propeller PP: Piston Propeller HE: Helicopter OTHER seymour_proxy: Aircraft code for Seymour Surrogate (https://doi.org/10.1016/j.trd.2020.102528), own classification to derive proxy aircraft when nominal aircraft type unavailable in the aircraft performance model. source: Original data source for the record, before compilation and enrichment.
ANAC: Brasilian Civil Aviation Authorities AUS Stats: Australian Civil Aviation Authorities BTS: US Bureau of Transportation Statistics T100 Estimation: Own model, estimation on Wikipedia-parsed route database Eurocontrol: Aggregation and enrichment of R&D database OpenSky World Bank seats: Number of seats available for the data entry, AFTER airport residual scaling n_flights: Number of flights of the data entry, when available iata_departure, iata_arrival : IATA code of the origin and destination airports. Some BTS inhouse identifiers could remain but it is marginal. departure_lon, departure_lat, arrival_lon, arrival_lat : Origin and destination coordinates, could be NaN if the IATA identifier is erroneous departure_country, arrival_country: Origin and destination country ISO2 code. WARNING: disable NA (Namibia) as default NaN at import departure_continent, arrival_continent: Origin and destination continent code. WARNING: disable NA (North America) as default NaN at import seats_no_est_scaling: Number of seats available for the data entry, BEFORE airport residual scaling distance_km: Flight distance (km) ask: Available Seat Kilometres rpk: Revenue Passenger Kilometres (simple calculation from ASK using IATA average load factor) fuel_burn_seymour: Fuel burn per flight (kg) when seymour proxy available fuel_burn: Total fuel burn of the data entry (kg) co2: Total CO2 emissions of the data entry (kg) domestic: Domestic/international boolean (Domestic=1, International=0)
3- Citation Please cite the support paper instead of the dataset itself.
Salgas, A., Sun, J., Delbecq, S., Planès, T., & Lafforgue, G. (2023). Compilation of an open-source traffic and CO2 emissions dataset for commercial aviation. Journal of Open Aviation Science. https://doi.org/10.59490/joas.2023.7201
As new technologies are developed to handle the complexities of the Next Generation Air Transportation System (NextGen), it is increasingly important to address both current and future safety concerns along with the operational, environmental, and efficiency issues within the National Airspace System (NAS). In recent years, the Federal Aviation Administration’s (FAA) safety offices have been researching ways to utilize the many safety databases maintained by the FAA, such as those involving flight recorders, radar tracks, weather, and many other high-volume sensors, in order to monitor this unique and complex system. Although a number of current technologies do monitor the frequency of known safety risks in the NAS, very few methods currently exist that are capable of analyzing large data repositories with the purpose of discovering new and previously unmonitored safety risks. While monitoring the frequency of known events in the NAS enables mitigation of already identified problems, a more proactive approach of finding unidentified issues still needs to be addressed. This is especially important in the proactive identification of new, emergent safety issues that may result from the planned introduction of advanced NextGen air traffic management technologies and procedures. Development of an automated tool that continuously evaluates the NAS to discover both events exhibiting flight characteristics indicative of safety-related concerns as well as operational anomalies will heighten the awareness of such situations in the aviation community and serve to increase the overall safety of the NAS. This paper discusses the extension of previous anomaly detection work to identify operationally significant flights within the highly complex airspace encompassing the New York area of operations, focusing on the major airports of Newark International (EWR), LaGuardia International (LGA), and John F. Kennedy International (JFK). In addition, flight traffic in the vicinity of Denver International (DEN) airport/airspace is also investigated to evaluate the impact on operations due to variances in seasonal weather and airport elevation. From our previous research, subject matter experts determined that some of the identified anomalies were significant, but could not reach conclusive findings without additional supportive data. To advance this research further, causal examination using domain experts is continued along with the integration of air traffic control (ATC) voice data to shed much needed insight into resolving which flight characteristic(s) may be impacting an aircraft's unusual profile. Once a flight characteristic is identified, it could be included in a list of potential safety precursors. This paper also describes a process that has been developed and implemented to automatically identify and produce daily reports on flights of interest from the previous day.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The European Flights Dataset is a tabulated dataset of more than 680,000 air traffic records, including instrument flight (IFR) arrivals and operations at major European airports from January 2016 to May 2022.
2) Data Utilization (1) European Flights Dataset has characteristics that: • Each row contains 14 key items, including year, month, flight date, airport code and name, country name, and number of departures, arrivals, and total flights based on IFR. • The data are segmented by airport, country, and month, so they are well structured to analyze time series and spatial changes in European air traffic. (2) European Flights Dataset can be used to: • Analysis of Air Traffic Trends and Recovery: Using IFR operational performance by year, month, and airport, you can analyze changes in air traffic before and after the pandemic, seasonal trends, and speed of recovery. • Airport and Country Comparison Study: National/Airport performance data can be used to compare and evaluate major hub airports, cross-country aviation network structure, policy effectiveness, and more.
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
This dataset provides detailed information on flight arrivals and delays for U.S. airports, categorized by carriers. The data includes metrics such as the number of arriving flights, delays over 15 minutes, cancellation and diversion counts, and the breakdown of delays attributed to carriers, weather, NAS (National Airspace System), security, and late aircraft arrivals. Explore and analyze the performance of different carriers at various airports during this period. Use this dataset to gain insights into the factors contributing to delays in the aviation industry.
Purpose: The purpose of this dataset is to offer insights into the performance of U.S. carriers at various airports during August 2013 - August 2023, focusing on flight arrivals and delays. By providing detailed information on key metrics such as the number of arriving flights, delays over 15 minutes, cancellations, and diversions, the dataset aims to facilitate analyses of factors contributing to delays, including those attributed to carriers, weather, the National Airspace System (NAS), security, and late aircraft arrivals. Researchers, data scientists, and aviation enthusiasts can leverage this dataset to explore patterns, identify trends, and draw conclusions that contribute to a better understanding of the aviation industry's operational challenges.
Structure: The dataset is structured as a tabular format with rows representing unique combinations of year, month, carrier, and airport. Each row contains information on various metrics, including flight counts, delay counts, cancellation and diversion counts, and delay breakdowns by different factors. The columns provide specific details such as carrier codes and names, airport codes and names, and counts of delays attributed to carrier, weather, NAS, security, and late aircraft arrivals. The structured format ensures that users can easily query, analyze, and visualize the data to derive meaningful insights.
Usage: Researchers, analysts, and data enthusiasts can utilize this dataset for a variety of purposes, including but not limited to:
Performance Analysis: Assess the on-time performance of different carriers at specific airports and identify potential areas for improvement.
Trend Identification: Analyze temporal trends in delays, cancellations, and diversions to understand whether certain months or periods exhibit higher operational challenges.
Root Cause Analysis: Investigate the primary contributors to delays, such as carrier-related issues, weather conditions, NAS inefficiencies, security concerns, or late aircraft arrivals.
Benchmarking: Compare the performance of various carriers across different airports to identify industry leaders and areas requiring attention.
Predictive Modeling: Use historical data to develop predictive models for flight delays, aiding in the development of strategies to mitigate disruptions.
Industry Insights: Contribute to a broader understanding of the factors influencing operational efficiency within the U.S. aviation sector.
As users explore and analyze the dataset, they can gain valuable insights that may inform decision-making processes, improve operational strategies, and contribute to a more efficient and reliable air travel experience.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The Flights Booking Dataset of various Airlines is a scraped datewise from a famous website in a structured format. The dataset contains the records of flight travel details between the cities in India. Here, multiple features are present like Source & Destination City, Arrival & Departure Time, Duration & Price of the flight etc.
This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.
This analyse will be helpful for those working in Airlines, Travel domain.
Using this dataset, we answered multiple questions with Python in our Project.
Q.1. What are the airlines in the dataset, accompanied by their frequencies?
Q.2. Show Bar Graphs representing the Departure Time & Arrival Time.
Q.3. Show Bar Graphs representing the Source City & Destination City.
Q.4. Does price varies with airlines ?
Q.5. Does ticket price change based on the departure time and arrival time?
Q.6. How the price changes with change in Source and Destination?
Q.7. How is the price affected when tickets are bought in just 1 or 2 days before departure?
Q.8. How does the ticket price vary between Economy and Business class?
Q.9. What will be the Average Price of Vistara airline for a flight from Delhi to Hyderabad in Business Class ?
These are the main Features/Columns available in the dataset :
1) Airline: The name of the airline company is stored in the airline column. It is a categorical feature having 6 different airlines.
2) Flight: Flight stores information regarding the plane's flight code. It is a categorical feature.
3) Source City: City from which the flight takes off. It is a categorical feature having 6 unique cities.
4) Departure Time: This is a derived categorical feature obtained created by grouping time periods into bins. It stores information about the departure time and have 6 unique time labels.
5) Stops: A categorical feature with 3 distinct values that stores the number of stops between the source and destination cities.
6) Arrival Time: This is a derived categorical feature created by grouping time intervals into bins. It has six distinct time labels and keeps information about the arrival time.
7) Destination City: City where the flight will land. It is a categorical feature having 6 unique cities.
8) Class: A categorical feature that contains information on seat class; it has two distinct values: Business and Economy.
9) Duration: A continuous feature that displays the overall amount of time it takes to travel between cities in hours.
10) Days Left: This is a derived characteristic that is calculated by subtracting the trip date by the booking date.
11) Price: Target variable stores information of the ticket price.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Have you taken a flight in the U.S. in the past 15 years? If so, then you are a part of monthly data that the U.S. Department of Transportation's TranStats service makes available on various metrics for 15 U.S. airlines and 30 major U.S airports. Their website unfortunately does not include a method for easily downloading and sharing files. Furthermore, the source is built in ASP.NET, so extracting the data is rather cumbersome. To allow easier community access to this rich source of information, I scraped the metrics for every airline / airport combination and stored them in separate CSV files.
Occasionally, an airline doesn't serve a certain airport, or it didn't serve it for the entire duration that the data collection period covers*. In those cases, the data either doesn't exist or is typically too sparse to be of much use. As such, I've only uploaded complete files for airports that an airline served for the entire uninterrupted duration of the collection period. For these files, there should be 174 time series points for one or more of the nine columns below. I recommend any of the files for American, Delta, or United Airlines for outstanding examples of complete and robust airline data.
* No data for Atlas Air exists, and Virgin America commenced service in 2007, so no folders for either airline are included.
There are 13 airlines that have at least one complete dataset. Each airline's folder includes CSV file(s) for each airport that are complete as defined by the above criteria. I've double-checked the files, but if you find one that violates the criteria, please point it out. The file names have the format "AIRLINE-AIRPORT.csv", where both AIRLINE and AIRPORT are IATA codes. For a full listing of the airlines and airports that the codes correspond to, check out the airline_codes.csv or airport_codes.csv files that are included, or perform a lookup here. Note that the data in each airport file represents metrics for flights that originated at the airport.
Among the 13 airlines in data.zip, there are a total of 161 individual datasets. There are also two special folders included - airlines_all_airports.csv and airports_all_airlines.csv. The first contains datasets for each airline aggregated over all airports, while the second contains datasets for each airport aggregated over all airlines. To preview a sample dataset, check out all_airlines_all_airports.csv, which contains industry-wide data.
Each file includes the following metrics for each month from October 2002 to March 2017:
* Frequently contains missing values
Thanks to the U.S. Department of Transportation for collecting this data every month and making it publicly available to us all.
Source: https://www.transtats.bts.gov/Data_Elements.aspx
The airline / airport datasets are perfect for practicing and/or testing time series forecasting with classic statistical models such as autoregressive integrated moving average (ARIMA), or modern deep learning techniques such as long short-term memory (LSTM) networks. The datasets typically show evidence of trends, seasonality, and noise, so modeling and accurate forecasting can be challenging, but still more tractable than time series problems possessing more stochastic elements, e.g. stocks, currencies, commodities, etc. The source releases new data each month, so feel free to check your models' performances against new data as it comes out. I will update the files here every 3 to 6 months depending on how things go.
A future plan is to build a SQLite database so a vast array of queries can be run against the data. The data in it its current time series format is not conducive for this, so coming up with a workable structure for the tables is the first step towards this goal. If you have any suggestions for how I can improve the data presentation, or anything that you would like me to add, please let me know. Looking forward to seeing the questions that we can answer together!
This dataset contains scheduled and actual departure and arrival times reported by certified US air carriers that account for at least 1% of domestic scheduled passenger revenues. The data was collected by the Office of Airline Information, Bureau of Transportation Statistics (BTS). The dataset contains date, time, origin, destination, airline, distance, and delay status of flights for flights between 2016 and 2018 The report, focusing on data from year 2016-2018, estimated that air transportation delays put a 4 billion dollar dent in the country's gross domestic product that year. Full report can be found here. In order to answer this question, we are going to analyze the provided dataset, containing up to 18 M different internal flights in the US for 2016-2018 and their causes for delay, diversion and cancellation; if any. The data comes from the U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics (BTS).
This dataset is composed by the following variables: Number Column Name Description 1 **Year **2016, 2017, 2018 2 **Month **1-12 3 **DayofMonth **1-31 4 **DayOfWeek **1 (Monday) - 7 (Sunday) 5 DepTime actual departure time (local, hhmm) 6 **CRSDepTime **scheduled departure time (local, hhmm) 7 **ArrTime **actual arrival time (local, hhmm) 8 **CRSArrTime **scheduled arrival time (local, hhmm) 9 **ActualElapsedTime **in minutes 10 **CRSElapsedTime **in minutes 11 **AirTime **in minutes 12 **ArrDelay **arrival delay, in minutes: A flight is counted as "on time" if it operated less than 15 minutes later the scheduled time shown in the carriers' Computerized Reservations Systems (CRS). 13 **DepDelay **departure delay, in minutes 14 **Origin **origin IATA airport code 15 **Dest **destination IATA airport code 16 **Distance **in miles 17 **TaxiIn **taxi in time, in minutes 18 **TaxiOut **taxi out time in minutes 19 **Cancelled ***was the flight cancelled 20 **CancellationCode **reason for cancellation (A = carrier, B = weather, C = NAS, D = security) 21 **Diverted **1 = yes, 0 = no 22 **CarrierDelay **in minutes: Carrier delay is within the control of the air carrier. Examples of occurrences that may determine carrier delay are: aircraft cleaning, aircraft damage, awaiting the arrival of connecting passengers or crew, baggage, bird strike, cargo loading, catering, computer, outage-carrier equipment, crew legality (pilot or attendant rest), damage by hazardous goods, engineering inspection, fuelling, handling disabled passengers, late crew, lavatory servicing, maintenance, oversales, potable water servicing, removal of unruly passenger, slow boarding or seating, stowing carry-on baggage, weight and balance delays. 23 **WeatherDelay **in minutes: Weather delay is caused by extreme or hazardous weather conditions that are forecasted or manifest themselves on point of departure, enrouted, or on point of arrival. 24 **NASDelay **in minutes: Delay that is within the control of the National Airspace System (NAS) may include: non-extreme weather conditions, airport operations, heavy traffic volume, air traffic control, etc. 25 **SecurityDelay **in minutes: Security delay is caused by evacuation of a terminal or concourse, re-boarding of aircraft because of security breach, inoperative screening equipment and/or long lines in excess of 29 minutes at screening areas. 26 **LateAircraftDelay **in minutes: Arrival delay at an airport due to the late arrival of the same aircraft at a previous airport. The ripple effect of an earlier delay at downstream airports is referred to as delay propagation.
A. SUMMARY San Francisco International Airport Report on Monthly Passenger Traffic Statistics by Airline. B. HOW THE DATASET IS CREATED Data is self-reported by airlines and is only available at a monthly level C. UPDATE PROCESS Data updated quarterly D. HOW TO USE THIS DATASET Airport data is seasonal in nature, therefore any comparative analyses should be done on a period-over-period basis (i.e. January 2010 vs. January 2009) as opposed to period-to-period (i.e. January 2010 vs. February 2010). It is also important to note that fact and attribute field relationships are not always 1-to-1. For example, Passenger Counts belonging to United Airlines will appear in multiple attribute fields and are additive, which provides flexibility for the user to derive categorical Passenger Counts as desired.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Daily data showing UK flight numbers and rolling seven-day average, including flights to, from, and within the UK. These are official statistics in development. Source: EUROCONTROL.
Motivation
The data in this dataset is derived and cleaned from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network's more than 2500 members since 1 January 2019. More data has been periodically included in the dataset until the end of the COVID-19 pandemic.
We stopped updating the dataset after December 2022. Previous files have been fixed after a thorough sanity check.
License
See LICENSE.txt
Disclaimer
The data provided in the files is provided as is. Despite our best efforts at filtering out potential issues, some information could be erroneous.
Origin and destination airports are computed online based on the ADS-B trajectories on approach/takeoff: no crosschecking with external sources of data has been conducted. Fields origin or destination are empty when no airport could be found.
Aircraft information come from the OpenSky aircraft database. Fields typecode and registration are empty when the aircraft is not present in the database.
Description of the dataset
One file per month is provided as a csv file with the following features:
callsign: the identifier of the flight displayed on ATC screens (usually the first three letters are reserved for an airline: AFR for Air France, DLH for Lufthansa, etc.)
number: the commercial number of the flight, when available (the matching with the callsign comes from public open API); this field may not be very reliable;
icao24: the transponder unique identification number;
registration: the aircraft tail number (when available);
typecode: the aircraft model type (when available);
origin: a four letter code for the origin airport of the flight (when available);
destination: a four letter code for the destination airport of the flight (when available);
firstseen: the UTC timestamp of the first message received by the OpenSky Network;
lastseen: the UTC timestamp of the last message received by the OpenSky Network;
day: the UTC day of the last message received by the OpenSky Network;
latitude_1, longitude_1, altitude_1: the first detected position of the aircraft;
latitude_2, longitude_2, altitude_2: the last detected position of the aircraft.
Examples
Possible visualisations and a more detailed description of the data are available at the following page:
Credit
If you use this dataset, please cite:
Martin Strohmeier, Xavier Olive, Jannis Lübbe, Matthias Schäfer, and Vincent Lenders "Crowdsourced air traffic data from the OpenSky Network 2019–2020" Earth System Science Data 13(2), 2021 https://doi.org/10.5194/essd-13-357-2021
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.
If you use this data for a scientific publication, please consider citing our paper.
The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:
go_arounds_minimal.csv.gz
Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:
Column name
Type
Description
time
date time
UTC time of landing or first GA attempt
icao24
string
Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign
string
Aircraft identifier in air-ground communications
airport
string
ICAO airport code where the aircraft is landing
runway
string
Runway designator on which the aircraft landed
has_ga
string
"True" if at least one GA was performed, otherwise "False"
n_approaches
integer
Number of approaches identified for this flight
n_rwy_approached
integer
Number of unique runways approached by this flight
The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.
go_arounds_augmented.csv.gz
Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:
Column name
Type
Description
time
date time
UTC time of landing or first GA attempt
icao24
string
Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign
string
Aircraft identifier in air-ground communications
airport
string
ICAO airport code where the aircraft is landing
runway
string
Runway designator on which the aircraft landed
has_ga
string
"True" if at least one GA was performed, otherwise "False"
n_approaches
integer
Number of approaches identified for this flight
n_rwy_approached
integer
Number of unique runways approached by this flight
registration
string
Aircraft registration
typecode
string
Aircraft ICAO typecode
icaoaircrafttype
string
ICAO aircraft type
wtc
string
ICAO wake turbulence category
glide_slope_angle
float
Angle of the ILS glide slope in degrees
has_intersection
string
Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length
float
Length of the runway in kilometre
airport_country
string
ISO Alpha-3 country code of the airport
airport_region
string
Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
operator_country
string
ISO Alpha-3 country code of the operator
operator_region
string
Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania)
wind_speed_knts
integer
METAR, surface wind speed in knots
wind_dir_deg
integer
METAR, surface wind direction in degrees
wind_gust_knts
integer
METAR, surface wind gust speed in knots
visibility_m
float
METAR, visibility in m
temperature_deg
integer
METAR, temperature in degrees Celsius
press_sea_level_p
float
METAR, sea level pressure in hPa
press_p
float
METAR, QNH in hPA
weather_intensity
list
METAR, list of present weather codes: qualifier - intensity
weather_precipitation
list
METAR, list of present weather codes: weather phenomena - precipitation
weather_desc
list
METAR, list of present weather codes: qualifier - descriptor
weather_obscuration
list
METAR, list of present weather codes: weather phenomena - obscuration
weather_other
list
METAR, list of present weather codes: weather phenomena - other
This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.
go_arounds_agg.csv.gz
Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:
Column name
Type
Description
airport
string
ICAO airport code where the aircraft is landing
runway
string
Runway designator on which the aircraft landed
n_landings
integer
Total number of landings observed on this runway in 2019
ga_rate
float
Go-around rate, per 1000 landings
glide_slope_angle
float
Angle of the ILS glide slope in degrees
has_intersection
string
Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length
float
Length of the runway in kilometres
airport_country
string
ISO Alpha-3 country code of the airport
airport_region
string
Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
This aggregated data set is used in the paper for the generalized linear regression model.
Downloading the trajectories
Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:
import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic
df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])
airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )
df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")
flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time
# fetch the data from OpenSky Network
flights.append(
opensky.history(
start=start_time.strftime("%Y-%m-%d %H:%M:%S"),
stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"),
callsign=row["callsign"],
return_flight=True,
)
)
Traffic.from_flights(flights)
Additional files
Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:
validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.
validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.
The National Airspace System (NAS) is an ever changing and complex engineering system. As the Next Generation Air Transportation System (NextGen) is developed, there will be an increased emphasis on safety and operational and environmental efficiency. Current operations in the NAS are monitored using a variety of data sources, including data from flight recorders, radar track data, weather data, and other massive data collection systems. Although numerous technologies exist to monitor the frequency of known but undesirable behaviors in the NAS, there are currently few methods that can analyze the large repositories to discover new and previously unknown events in the NAS. Having a tool to discover events that have implications for safety or incidents of operational importance, increases the awareness of such scenarios in the community and helps to broaden the overall safety of the NAS, whereas only monitoring the frequency of known events can only provide mitigations for already established problems. This paper discusses a novel approach for discovering operationally significant events in the NAS that are currently not monitored and have potential safety and/or efficiency implications using radar-track data. This paper will discuss the discovery algorithm and describe in detail some flights of interest with comments from subject matter experts who are familiar with the operations in the airspace that was studied.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This dataset consists of San Francisco International Airport (SFO) air traffic cargo dataset contains data about cargo volume into and out of SFO, in both metric tons and pounds, with monthly totals by airline, region and aircraft type.
B. HOW THE DATASET IS CREATED Data is self-reported by airlines and is only available at a monthly level.
C. UPDATE PROCESS Data is available starting in July 1999 and will be updated monthly.
D. HOW TO USE THIS DATASET Airport data is seasonal in nature; therefore, any comparative analyses should be done on a period-over-period basis (i.e. January 2010 vs. January 2009) as opposed to period-to-period (i.e. January 2010 vs. February 2010). It is also important to note that fact and attribute field relationships are not always 1-to-1. For example, Cargo Statistics belonging to United Airlines will appear in multiple attribute fields and are additive, which provides flexibility for the user to derive categorical Cargo Statistics as desired.
E. RELATED DATASETS A summary of monthly comparative air-traffic statistics is also available on SFO’s internet site at
https://www.flysfo.com/about/media/facts-statistics/air-traffic-statistics
This dataset contains the records of all the flights in the Northern California TRACON. The data was provided by the aircraft noise abatement office (http://www.flyquietsfo.com/) of San Francisco International Airport. The data cover Jan-Mar 2006. It is organized by day and flight. Each record contains some information about the flight and a sequence of 3D position and estimated speed. This data contains thousands of trajectories that can be used for trajectory clustering. The data is used by the Aircraft Noise Abatement Office to analyze the trajectories of aircraft flying in and out SFO. The objective is to minimize the noise pollution due to aircraft in the San Francisco Bay Area The files have the extension "lt6" and are organized as follow, one file per day. line number & explaination 1 TRACK OPNUM (TRACK header word and operation number) 2 eventid (Corralation number) 3 trackstart date (in time since 1900, A8 version four year digit) 4 trackstart time HH:MM:SS 5 trackend time HH:MM:SS 6 airportid 7 ACID (FLIGHTNUM/TAILNUMBER) 8 owner name 9 aircrafttype 10 aircraft category 11 beacon 12 adflag 13 waypoint 14 other_port (dest/origin) 15 runwayname 16 min alt 17 max alt 18 min range 19 max range 20 Count of trackpoints (to follow) 21 x,y,z,v,t (all points is meters relative to MRP, velocity and time from start of track)
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY San Francisco International Airport (SFO) keeps track of historical flight operations, also known as aircraft RADAR data for analysis and reporting.
B. HOW THE DATASET IS CREATED Details of flights from the Federal Aviation Administration’s National Offload Program are processed into SFO’s Airport Noise and Operations Management System (ANOMS) where it is correlated with noise reports from the communities and to noise levels collected from noise monitor sites on the San Francisco Peninsula. In ANOMS, various analysis gates (imaginary vertical curtain in space) are used to identify which route flights flew departing and arriving SFO. It serves to quantify, analyze, respond to noise concerns, and report on Runway Use and various programs to reduce aircraft noise in communities surrounding SFO.
C. UPDATE PROCESS Data is available starting in August 2019 and will be updated monthly.
D. HOW TO USE THIS DATASET It is important to note, that this dataset is of flights departing and landing at SFO only and not flight activities associated with other airports in the Bay Area region. This information is the data source used to produce the Flight Operations sections (pages 3-5) of the Airport Director’s Report. These reports are presented at the SFO Airport Community Roundtable Meetings and available online at https://noise.flysfo.com/reports/?category=airport-directors-report
E. RELATED DATASETS Unique Flight Operations - This filtered view contains unique records of flight operations. For example, one record for a flight that departed SFO or one record for a flight that landed at SFO.
Arrival and Departure Routes - This filtered view contains records of flights with details of analysis gate(s) the aircraft flight track penetrates, to derive which route was used to depart and land at SFO.
This dataset contains Operations and Arrival and Departure Routes joined on operation_number. The field gate_penetration is derived by ordering the arrival and departure routes for each operation over gate_penetration_time. Unique_identifier is then created by joining operation_number and gate_penetration.
Other provided datasets are Aircraft Noise Reports, Late Night Aircraft Departures, Air Carrier Runway Use, and Late Night Preferential Runway Use, Aircraft Noise Climates, and Noise Exceedance Rating.
Please contact the Noise Abatement Office at NoiseAbatementOffice@flysfo.com for any questions regarding this data.
Date created: November 17, 2023
This datasets contains information about number of flight, passengers, and cargo in Saudi Arabia's Domestic airports, for 2016- 2019. Data from General Authority for Statistics . Export API data for more datasets to advance energy economics research.Source : Saudi Arabian Airlines Organization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data that looks at how market structure affects delays for US domestic flights between the years 2004 - 2017.
Data on airline delays come from the Airline On-Time Performance Data (OTPD) from the US Bureau of Transportation Statistics. The data on tail numbers and seat capacity come from the Federal Aircraft Administration Aircraft Registry. The data on flight-related whether comes from the Local Climatological Data (LCD) provided by the National Center for Environmental Information.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Airline data holds immense importance as it offers insights into the functioning and efficiency of the aviation industry. It provides valuable information about flight routes, schedules, passenger demographics, and preferences, which airlines can leverage to optimize their operations and enhance customer experiences. By analyzing data on delays, cancellations, and on-time performance, airlines can identify trends and implement strategies to improve punctuality and mitigate disruptions. Moreover, regulatory bodies and policymakers rely on this data to ensure safety standards, enforce regulations, and make informed decisions regarding aviation policies. Researchers and analysts use airline data to study market trends, assess environmental impacts, and develop strategies for sustainable growth within the industry. In essence, airline data serves as a foundation for informed decision-making, operational efficiency, and the overall advancement of the aviation sector.
This dataset comprises diverse parameters relating to airline operations on a global scale. The dataset prominently incorporates fields such as Passenger ID, First Name, Last Name, Gender, Age, Nationality, Airport Name, Airport Country Code, Country Name, Airport Continent, Continents, Departure Date, Arrival Airport, Pilot Name, and Flight Status. These columns collectively provide comprehensive insights into passenger demographics, travel details, flight routes, crew information, and flight statuses. Researchers and industry experts can leverage this dataset to analyze trends in passenger behavior, optimize travel experiences, evaluate pilot performance, and enhance overall flight operations.
https://i.imgur.com/cUFuMeU.png" alt="">
The dataset provided here is a simulated example and was generated using the online platform found at Mockaroo. This web-based tool offers a service that enables the creation of customizable Synthetic datasets that closely resemble real data. It is primarily intended for use by developers, testers, and data experts who require sample data for a range of uses, including testing databases, filling applications with demonstration data, and crafting lifelike illustrations for presentations and tutorials. To explore further details, you can visit their website.
Cover Photo by: Kevin Woblick on Unsplash
Thumbnail by: Airplane icons created by Freepik - Flaticon