The number of flights performed globally by the airline industry has increased steadily since the early 2000s and reached **** million in 2019. However, due to the coronavirus pandemic, the number of flights dropped to **** million in 2020. The flight volume increased again in the following years and was forecasted to reach ** million in 2025.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Have you taken a flight in the U.S. in the past 15 years? If so, then you are a part of monthly data that the U.S. Department of Transportation's TranStats service makes available on various metrics for 15 U.S. airlines and 30 major U.S airports. Their website unfortunately does not include a method for easily downloading and sharing files. Furthermore, the source is built in ASP.NET, so extracting the data is rather cumbersome. To allow easier community access to this rich source of information, I scraped the metrics for every airline / airport combination and stored them in separate CSV files.
Occasionally, an airline doesn't serve a certain airport, or it didn't serve it for the entire duration that the data collection period covers*. In those cases, the data either doesn't exist or is typically too sparse to be of much use. As such, I've only uploaded complete files for airports that an airline served for the entire uninterrupted duration of the collection period. For these files, there should be 174 time series points for one or more of the nine columns below. I recommend any of the files for American, Delta, or United Airlines for outstanding examples of complete and robust airline data.
* No data for Atlas Air exists, and Virgin America commenced service in 2007, so no folders for either airline are included.
There are 13 airlines that have at least one complete dataset. Each airline's folder includes CSV file(s) for each airport that are complete as defined by the above criteria. I've double-checked the files, but if you find one that violates the criteria, please point it out. The file names have the format "AIRLINE-AIRPORT.csv", where both AIRLINE and AIRPORT are IATA codes. For a full listing of the airlines and airports that the codes correspond to, check out the airline_codes.csv or airport_codes.csv files that are included, or perform a lookup here. Note that the data in each airport file represents metrics for flights that originated at the airport.
Among the 13 airlines in data.zip, there are a total of 161 individual datasets. There are also two special folders included - airlines_all_airports.csv and airports_all_airlines.csv. The first contains datasets for each airline aggregated over all airports, while the second contains datasets for each airport aggregated over all airlines. To preview a sample dataset, check out all_airlines_all_airports.csv, which contains industry-wide data.
Each file includes the following metrics for each month from October 2002 to March 2017:
* Frequently contains missing values
Thanks to the U.S. Department of Transportation for collecting this data every month and making it publicly available to us all.
Source: https://www.transtats.bts.gov/Data_Elements.aspx
The airline / airport datasets are perfect for practicing and/or testing time series forecasting with classic statistical models such as autoregressive integrated moving average (ARIMA), or modern deep learning techniques such as long short-term memory (LSTM) networks. The datasets typically show evidence of trends, seasonality, and noise, so modeling and accurate forecasting can be challenging, but still more tractable than time series problems possessing more stochastic elements, e.g. stocks, currencies, commodities, etc. The source releases new data each month, so feel free to check your models' performances against new data as it comes out. I will update the files here every 3 to 6 months depending on how things go.
A future plan is to build a SQLite database so a vast array of queries can be run against the data. The data in it its current time series format is not conducive for this, so coming up with a workable structure for the tables is the first step towards this goal. If you have any suggestions for how I can improve the data presentation, or anything that you would like me to add, please let me know. Looking forward to seeing the questions that we can answer together!
In 2023, the estimated number of scheduled passengers boarded by the global airline industry amounted to approximately *** billion people. This represents a significant increase compared to the previous year since the pandemic started and the positive trend was forecast to continue in 2024, with the scheduled passenger volume reaching just below **** billion travelers. Airline passenger traffic The number of scheduled passengers handled by the global airline industry has increased in all but one of the last decade. Scheduled passengers refer to the number of passengers who have booked a flight with a commercial airline. Excluded are passengers on charter flights, whereby an entire plane is booked by a private group. In 2023, the Asia Pacific region had the highest share of airline passenger traffic, accounting for ********* of the global total.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Have you ever been stuck in an airport because your flight was delayed or cancelled and wondered if you could have predicted it if you'd had more data? This is your chance to find out.
The 2009 ASA Statistical Computing and Graphics Data Expo consisted of flight arrival and departure details for all commercial flights on major carriers within the USA, from October 1987 to April 2008. This is a large dataset containing nearly 120 million records in total.
The aim of the data expo is to provide a graphical summary of important features of the data set. This is intentionally vague in order to allow different entries to focus on different aspects of the data, but here are a few ideas to get you started: •When is the best time of day, day of the week, and time of year to fly to minimise delays? •Do older planes suffer more delays? •How well does weather predict plane delays? •How does the number of people flying between different locations change over time? •Can you detect cascading failures as delays in one airport create delays in others? Are there critical links in the system? •Use the available variables to construct a model that predicts delays.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Daily data showing UK flight numbers and rolling seven-day average, including flights to, from, and within the UK. These are official statistics in development. Source: EUROCONTROL.
This layer visualizes over 60,000 commercial flight paths. The data was obtained from openflights.org, and was last updated in June 2014. The site states, "The third-party that OpenFlights uses for route data ceased providing updates in June 2014. The current data is of historical value only. As of June 2014, the OpenFlights/Airline Route Mapper Route Database contains 67,663 routes between 3,321 airports on 548 airlines spanning the globe. Creating and maintaining this database has required and continues to require an immense amount of work. We need your support to keep this database up-to-date."To donate, visit the site and click the PayPal link.Routes were created using the XY-to-line tool in ArcGIS Pro, inspired by Kenneth Field's work, and following a modified methodology from Michael Markieta (www.spatialanalysis.ca/2011/global-connectivity-mapping-out-flight-routes).Some cleanup was required in the original data, including adding missing location data for several airports and some missing IATA codes. Before performing the point to line conversion, the key to preserving attributes in the original data is a combination of the INDEX and MATCH functions in Microsoft Excel. Example function: =INDEX(Airlines!$B$2:$B$6200,MATCH(Routes!$A2,Airlines!$D$2:Airlines!$D$6200,0))
Passengers enplaned and deplaned at Canadian airports, annual.
Description: Aviation statistics are compiled from data supplied by all Irish airports. The following Irish airports provide data to the Central Statistics Office: Dublin, Cork, Shannon, Kerry, Knock, Waterford, Connemara, Donegal and Inishmore. Galway and Sligo airports ceased operations in 2011. There have been no commercial flights in Waterford Airport since June 2016. Data for the five main airports is supplied on a monthly basis. Data for regional airports is supplied annually to the Central Statistics Office.This dataset provides a time-series view of international passenger numbers, commercial flights and total freight (tonnes) for the five main airports of Dublin Shannon, Cork, Knock and Kerry Geography available in RDM: Five main airportsSource: CSO Air and Sea Travel Statistics Weblink: https://www.cso.ie/en/statistics/tourismandtravel/airandseatravelstatistics/Date of last source data update: August 2023Update Schedule: Quarterly Update
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Aviation Surveillance: This model could be utilized for real-time monitoring activities on large airfields, helping to identify various aircraft types including commercial planes and military planes.
Military Intelligence: Security forces could use it to analyze satellite or drone footage, distinguishing between military and civilian aircrafts. This might be useful in areas of conflict for monitoring enemy activities.
Aircraft Research: Aviation research organizations can use this model for developing and testing new types of aircraft. They can compare the identified class with the expected one to verify if the new plane corresponds to the desired design.
Air Traffic Control Training: The model could be used in simulation training programs for air traffic controllers, assisting in teaching trainees how to distinguish between different classes of planes.
Drone Safety: Drone operators can use this model to help automatic drone collision avoidance systems distinguish between plane types, improving safety by providing appropriate distances to maintain depending on the category of aircraft detected.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
The aviation accident database throughout the world, from 1908-2019.
There are similar dataset available on Kaggle. This dataset is cleaned versioned and source code is available on github.
Data is scraped from planecrashinfo.com. Below you can find the dataset column descriptions:
The original data is from the Plane Crash info website (http://www.planecrashinfo.com/database.htm). Dataset is scraped with Python. Source code is also public on Github
Find the root cause of plane crashes. Find any insights from dataset such as - Which operators are the worst - Which aircrafts are the worst
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Airports database is a geographic point database of aircraft landing facilities in the United States and U.S. Territories. Attribute data is provided on the physical and operational characteristics of the landing facility, current usage including enplanements and aircraft operations, congestion levels and usage categories. This geospatial data is derived from the FAA's National Airspace System Resource Aeronautical Data Product.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unmanned Aerial Vehicles (UAV) provide increased access to unique types of urban imagery traditionally not available. Advanced machine learning and computer vision techniques when applied to UAV RGB image data can be used for automated extraction of building asset information and if applied to UAV thermal imagery data can detect potential thermal anomalies. However, these UAV datasets are not easily available to researchers, thereby creating a barrier to accelerating research in this area.
To assist researchers with added data to develop machine learning algorithms, we present UAVID3D (Unmanned Aerial Vehicle (UAV) Image Dataset of the Built Environment for 3D reconstruction). The raw images for our dataset were recorded with a Zenmuse XT2 visual (RGB) and a FLIR Tau 2 (thermal, https://flir.netx.net/file/asset/15598/original/) camera on a DJI Mavic 2 pro drone (https://www.dji.com/matrice-200-series). The thermal camera is factory calibrated. All data is organized and structured to comply with FAIR principles, i.e. being findable, accessible, interoperable, and reusable. It is publicly available and can be downloaded from the Zenodo data repository.
RGB images were recorded during UAV fly-overs of two different commercial buildings in Northern California. In addition, thermographic images were recorded during 2 subsequent UAV fly-overs of the same two buildings. UAV flights were recorded at flight heights between 60–80 m above ground with a flight speed of 1 m s and contain GPS information. All images were recorded during drone flights on May 10, 2021 between 8:45 am and 10:30 am and on May 19, 2021 between 2:15 pm and 4:30 pm. Outdoor air temperatures on these two days during the flights were between 78 and 83 degree fahrenheit and between 58 and 65 degree fahrenheit respectively.
For the RGB flights, UAV path was planned and captured using an orbital flight plan in PIX4D capture at normal flight speed and overlap angle of 10 degree. Thermal images were captured by manual flights approximately 5 m away from each building facade. Due to the high overlap of images, similarities from feature points identified in each image can be extracted to conduct photogrammetry. Photogrammetry allows estimation of the three-dimensional coordinates of points on an object in a generated 3D space involving measurements made on images taken with a high overlap rate. Photogrammetry can be used to create a 3D point cloud model of the recorded region. UAVID3D dataset is a series of compressed archive files totaling 21GB. Useful pipelines to process these images can be found at these two repositories https://github.com/LBNL-ETA/a3dbr, and https://github.com/LBNL-ETA/AutoBFE
This work was supported by the Assistant Secretary for Energy Efficiency and Renewable Energy, Building Technologies Program, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
MotivationThe Dataset for Unmanned Aircraft System (UAS) Cellular Communications, short DUCC, was created with the aim of advancing communications for Beyond Visual Line of Sight (BVLOS) operations. With this objective in mind, datasets were generated to analyse the behaviour of cellular communications for UAS operations.
MeasurementA measurement setup was implemented to execute the measurements. Two Sierra Wireless EM9191 modems possessing both LTE and 5G capabilities were utilized in order to establish a connection to the cellular network and measure the physical parameters of the air-link. Every modem was equipped with four Taoglas antennas, two of type TG 35.8113 and two of type TG 45.8113. To capture the measurements a Raspberry Pi 4B is used. All hardware components were integrated into a box and attached to a DJI Matrice 300 RTK. A connection to the drone controller has been established to obtain location, speed and attitude. To measure end-to-end network parameters, dummy data was exchanged bidirectionally between the Raspberry Pi and a server. Both the server as well as the Raspberry Pi are synchronized with the GPS time in order to measure the one-way packet delay. For this purpose, we utilised Iperf3 and customised it to suit our requirements. To ensure precise positioning of the drone a Real Time Kinematik (RTK) station was placed on the ground during the measurements.
The measurements were performed at three distinct rural locations. Waypoint flights were undertaken with the points arranged in a cuboid formation maximizing the coverage of the air volume. Thereby, the campaigns were conducted with varying drone speeds. Moreover, for location A, different flight routes with rotated grids were implemented to reduce bias. Finally, a validation dataset is provided for location A, where the waypoints were calculated according to Quality of Service (QoS) based path-planning.
Dataset Structure and UsageThe dataset's structure consists of:-- Dataset |-- LocationX |-- RouteX (in case different routes at LocationX were created) |-- LocXRouteX.kml (file containing the waypoints in the kml format) |-- SpeedXMeterPerSecond (folder containing the datasets recorded with a specific drone speed) |-- YYYY-MM-DD hh_mm_ss.s.pkl.gz (Dataset file) |-- RouteY |-- ... |-- ...
The dataset files can be loaded using the pandas module in python3. The file "load.py" provides a sample script for loading a dataset as well as the corresponding .kml file which contains the predefined waypoints. In the file "Parameter_Description.csv" each parameter measured is further explained.
LicenseAll datasets are copyright by us and published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. This means that you must attribute the work in the manner specified by the authors, you may not use this work for commercial purposes and if you alter, transform, or build upon this work, you may distribute the resulting work only under the same license. This dataset is made available for academic use only. However, we take your privacy seriously! If you find yourself or personal belongings in this dataset and feel unwell about it, please contact us at automotive@oth-aw.de and we will immediately remove the respective data from our server.
AchnowledgementThe authors gratefully acknowledge the following European Union H2020 -- ECSEL Joint Undertaking project for financial support including funding by the German Federal Ministry for Education and Research (BMBF): ADACORSA (Grant Agreement No. 876019, funding code 16MEE0039).
As a result of the continued annual growth in global air traffic passenger demand, the number of airplanes that were involved in accidents is on the increase. Although the United States is ranked among the 20 countries with the highest quality of air infrastructure, the U.S. reports the highest number of civil airliner accidents worldwide. 2020 was the year with more plane crashes victims, despite fewer flights The number of people killed in accidents involving large commercial aircraft has risen globally in 2020, even though the number of commercial flights performed last year dropped by 57 percent to 16.4 million. More than half of the total number of deaths were recorded in January 2020, when an Ukrainian plane was shot down in Iranian airspace, a tragedy that killed 176 people. The second fatal incident took place in May, when a Pakistani airliner crashed, killing 97 people. Changes in aviation safety In terms of fatal accidents, it seems that aviation safety experienced some decline on a couple of parameters. For example, there were 0.37 jet hull losses per one million flights in 2016. In 2017, passenger flights recorded the safest year in world history, with only 0.11 jet hull losses per one million flights. In 2020, the region with the highest hull loss rate was the Commonwealth of Independent States. These figures do not take into account accidents involving military, training, private, cargo and helicopter flights.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The number of flights performed globally by the airline industry has increased steadily since the early 2000s and reached **** million in 2019. However, due to the coronavirus pandemic, the number of flights dropped to **** million in 2020. The flight volume increased again in the following years and was forecasted to reach ** million in 2025.