https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Airline data holds immense importance as it offers insights into the functioning and efficiency of the aviation industry. It provides valuable information about flight routes, schedules, passenger demographics, and preferences, which airlines can leverage to optimize their operations and enhance customer experiences. By analyzing data on delays, cancellations, and on-time performance, airlines can identify trends and implement strategies to improve punctuality and mitigate disruptions. Moreover, regulatory bodies and policymakers rely on this data to ensure safety standards, enforce regulations, and make informed decisions regarding aviation policies. Researchers and analysts use airline data to study market trends, assess environmental impacts, and develop strategies for sustainable growth within the industry. In essence, airline data serves as a foundation for informed decision-making, operational efficiency, and the overall advancement of the aviation sector.
This dataset comprises diverse parameters relating to airline operations on a global scale. The dataset prominently incorporates fields such as Passenger ID, First Name, Last Name, Gender, Age, Nationality, Airport Name, Airport Country Code, Country Name, Airport Continent, Continents, Departure Date, Arrival Airport, Pilot Name, and Flight Status. These columns collectively provide comprehensive insights into passenger demographics, travel details, flight routes, crew information, and flight statuses. Researchers and industry experts can leverage this dataset to analyze trends in passenger behavior, optimize travel experiences, evaluate pilot performance, and enhance overall flight operations.
https://i.imgur.com/cUFuMeU.png" alt="">
The dataset provided here is a simulated example and was generated using the online platform found at Mockaroo. This web-based tool offers a service that enables the creation of customizable Synthetic datasets that closely resemble real data. It is primarily intended for use by developers, testers, and data experts who require sample data for a range of uses, including testing databases, filling applications with demonstration data, and crafting lifelike illustrations for presentations and tutorials. To explore further details, you can visit their website.
Cover Photo by: Kevin Woblick on Unsplash
Thumbnail by: Airplane icons created by Freepik - Flaticon
https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.3/customlicense?persistentId=doi:10.57745/LLRJO0https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.3/customlicense?persistentId=doi:10.57745/LLRJO0
This database contains data of nearly 230 airplanes. Each airplane is described by 31 parameters such as: name, IATA code and category (general, commuter, regional, short-medium, long range), geometry, mass, max speed, typical cruise mach number, typical range, typical approach speed, take-off field length, landing field length, number of engine, type of engine, typical engine model, bypass ratio, max thrust or max power. This database relies on various sources such as the manufacturer website, flight manual (if available), books, Eurocontrol aircraft performances. This data are NOT intended to be used in an operational context. They were gathered in order to provide orders of magnitude and find trends between various parameters in a preliminary aircraft design context. Please consider citing the following publication if you reuse this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.
If you use this data for a scientific publication, please consider citing our paper.
The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:
go_arounds_minimal.csv.gz
Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:
Column name | Type | Description |
---|---|---|
time | date time | UTC time of landing or first GA attempt |
icao24 | string | Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned |
callsign | string | Aircraft identifier in air-ground communications |
airport | string | ICAO airport code where the aircraft is landing |
runway | string | Runway designator on which the aircraft landed |
has_ga | string | "True" if at least one GA was performed, otherwise "False" |
n_approaches | integer | Number of approaches identified for this flight |
n_rwy_approached | integer | Number of unique runways approached by this flight |
The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.
go_arounds_augmented.csv.gz
Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:
Column name | Type | Description |
---|---|---|
time | date time | UTC time of landing or first GA attempt |
icao24 | string | Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned |
callsign | string | Aircraft identifier in air-ground communications |
airport | string | ICAO airport code where the aircraft is landing |
runway | string | Runway designator on which the aircraft landed |
has_ga | string | "True" if at least one GA was performed, otherwise "False" |
n_approaches | integer | Number of approaches identified for this flight |
n_rwy_approached | integer | Number of unique runways approached by this flight |
registration | string | Aircraft registration |
typecode | string | Aircraft ICAO typecode |
icaoaircrafttype | string | ICAO aircraft type |
wtc | string | ICAO wake turbulence category |
glide_slope_angle | float | Angle of the ILS glide slope in degrees |
has_intersection |
string | Boolean that is true if the runway has an other runway intersecting it, otherwise false |
rwy_length | float | Length of the runway in kilometre |
airport_country | string | ISO Alpha-3 country code of the airport |
airport_region | string | Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania) |
operator_country | string | ISO Alpha-3 country code of the operator |
operator_region | string | Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania) |
wind_speed_knts | integer | METAR, surface wind speed in knots |
wind_dir_deg | integer | METAR, surface wind direction in degrees |
wind_gust_knts | integer | METAR, surface wind gust speed in knots |
visibility_m | float | METAR, visibility in m |
temperature_deg | integer | METAR, temperature in degrees Celsius |
press_sea_level_p | float | METAR, sea level pressure in hPa |
press_p | float | METAR, QNH in hPA |
weather_intensity | list | METAR, list of present weather codes: qualifier - intensity |
weather_precipitation | list | METAR, list of present weather codes: weather phenomena - precipitation |
weather_desc | list | METAR, list of present weather codes: qualifier - descriptor |
weather_obscuration | list | METAR, list of present weather codes: weather phenomena - obscuration |
weather_other | list | METAR, list of present weather codes: weather phenomena - other |
This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.
go_arounds_agg.csv.gz
Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:
Column name | Type | Description |
---|---|---|
airport | string | ICAO airport code where the aircraft is landing |
runway | string | Runway designator on which the aircraft landed |
n_landings | integer | Total number of landings observed on this runway in 2019 |
ga_rate | float | Go-around rate, per 1000 landings |
glide_slope_angle | float | Angle of the ILS glide slope in degrees |
has_intersection | string | Boolean that is true if the runway has an other runway intersecting it, otherwise false |
rwy_length | float | Length of the runway in kilometres |
airport_country | string | ISO Alpha-3 country code of the airport |
airport_region | string | Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania) |
This aggregated data set is used in the paper for the generalized linear regression model.
Downloading the trajectories
Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:
import datetime
from tqdm.auto import tqdm
import pandas as pd
from traffic.data import opensky
from traffic.core import Traffic
load minimum data set
df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False)
df["time"] = pd.to_datetime(df["time"])
select London City Airport, go-arounds, and 2019-01-04
airport = "EGLC"
start = datetime.datetime(year=2019, month=1, day=4).replace(
tzinfo=datetime.timezone.utc
)
stop = datetime.datetime(year=2019, month=1, day=5).replace(
tzinfo=datetime.timezone.utc
)
df_selection = df.query("airport==@airport & has_ga
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The Flights Booking Dataset of various Airlines is a scraped datewise from a famous website in a structured format. The dataset contains the records of flight travel details between the cities in India. Here, multiple features are present like Source & Destination City, Arrival & Departure Time, Duration & Price of the flight etc.
This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.
This analyse will be helpful for those working in Airlines, Travel domain.
Using this dataset, we answered multiple questions with Python in our Project.
Q.1. What are the airlines in the dataset, accompanied by their frequencies?
Q.2. Show Bar Graphs representing the Departure Time & Arrival Time.
Q.3. Show Bar Graphs representing the Source City & Destination City.
Q.4. Does price varies with airlines ?
Q.5. Does ticket price change based on the departure time and arrival time?
Q.6. How the price changes with change in Source and Destination?
Q.7. How is the price affected when tickets are bought in just 1 or 2 days before departure?
Q.8. How does the ticket price vary between Economy and Business class?
Q.9. What will be the Average Price of Vistara airline for a flight from Delhi to Hyderabad in Business Class ?
These are the main Features/Columns available in the dataset :
1) Airline: The name of the airline company is stored in the airline column. It is a categorical feature having 6 different airlines.
2) Flight: Flight stores information regarding the plane's flight code. It is a categorical feature.
3) Source City: City from which the flight takes off. It is a categorical feature having 6 unique cities.
4) Departure Time: This is a derived categorical feature obtained created by grouping time periods into bins. It stores information about the departure time and have 6 unique time labels.
5) Stops: A categorical feature with 3 distinct values that stores the number of stops between the source and destination cities.
6) Arrival Time: This is a derived categorical feature created by grouping time intervals into bins. It has six distinct time labels and keeps information about the arrival time.
7) Destination City: City where the flight will land. It is a categorical feature having 6 unique cities.
8) Class: A categorical feature that contains information on seat class; it has two distinct values: Business and Economy.
9) Duration: A continuous feature that displays the overall amount of time it takes to travel between cities in hours.
10) Days Left: This is a derived characteristic that is calculated by subtracting the trip date by the booking date.
11) Price: Target variable stores information of the ticket price.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
AeroSonicDB (YPAD-0523): Labelled audio dataset for acoustic detection and classification of aircraftVersion 1.1.2 (November 2023)
[UPDATE: June 2024]
Version 2.0 is currently in beta and can be found at https://zenodo.org/records/12775560. The repository is currently restricted, however you can gain access by emailing Blake Downward at aerosonicdb@gmail.com, or by submitting the following Google Form.
Version 2 vastly extends the number of Aircraft audio samples to over 3,000 (V1 contains 625 aircraft sampes), for more than 38 hours of strongly annotated aircraft audio (V1 contains 8.9 hours of aircraft audio).
Publication
When using this data in an academic work, please reference the dataset DOI and version. Please also reference the following paper which describes the methodology for collecting the dataset and presents baseline model results.
Downward, B., & Nordby, J. (2023). The AeroSonicDB (YPAD-0523) Dataset for Acoustic Detection and Classification of Aircraft. ArXiv, abs/2311.06368.
Description
AeroSonicDB:YPAD-0523 is a specialised dataset of ADS-B labelled audio clips for research in the fields of environmental noise attribution and machine listening, particularly acoustic detection and classification of low-flying aircraft. Audio files in this dataset were recorded at locations in close proximity to a flight path approaching or departing Adelaide International Airport's (ICAO code: YPAD) primary runway, 05/23. Recordings are initially labelled from radio (ADS-B) messages received from the aircraft overhead, then human verified and annotated with the first and final moments which the target aircraft is audible.
A total of 1,895 audio clips are distributed across two top-level classes, "Aircraft" (8.87 hours) and "Silence" (3.52 hours). The aircraft class is then further broken-down into four subclasses, which broadly describe the structure of the aircraft and propulsion mechanism. A variety of additional "airframe" features are provided to give researchers finer control of the dataset, and the opportunity to develop ontologies specific to their own use case.
For convenience, the dataset has been split into training (10.04 hours) and testing (2.35 hours) subsets, with the training set further split into 5 distinct folds for cross-validation. These splits are performed to prevent data-leakage between folds and the test set, ensuring samples collected in the same recording session (distinct in time, location and microphone) are assigned to the same fold.
Researchers may find applications for this dataset in a number of fields; particularly aircraft noise isolation and noise monitoring in an urban environment, development of passive acoustic systems to assist radar technology, and understanding the sources of aircraft noise to help manufacturers design less-noisy aircraft.
Audio data
ADS-B (Automatic Dependent Surveillance–Broadcast) messages transmitted directly from aircraft are used to automatically trigger, capture and label audio samples. A 60-second recording is triggered when an aircraft transmits a message indicating it is within a specified distance of the recording device (see "Location data" below for specifics). The resulting audio file is labelled with the unique ICAO identifier code for the aircraft, as well as its last reported altitude, date, time, location and microphone. The recording is then human verified and annotated with timestamps for the first and last moments the aircraft is audible. In total, AeroSonicDB contains 625 recordings of low-altitude aircraft - varying in length from 18 to 60 seconds, for a total of 8.87 hours of aircraft audio.
A collection of urban background noise without aircraft (silence) is included with the dataset as a means of distinguishing location specific environmental noises from aircraft noises. 10-second background noise, or "silence" recordings are triggered only when there are no aircraft broadcasting they are within a specified distance of the recording device (see "Location data" below). These "silence" recordings are also human verified to ensure no aircraft noise is present. The dataset contains 1,270 clips of silence/urban background noise.
Location data
Recordings have been collected from three (3) locations. GPS coordinates for each location are provided in the "locations.json" file. In order to protect privacy, coordinates have been provided for a road or public space nearby the recording device instead of its exact location.
Location: 0Situated in a suburban environment approximately 15.5km north-east of the start/end of the runway. For Adelaide, typical south-westerly winds bring most arriving aircraft past this location on approach. Winds from the north or east will cause aircraft to take-off to the north-east, however not all departing aircraft will maintain a course to trigger a recording at this location. The "trigger distance" for this location is set for 3km to ensure small/slower aircraft and large/faster aircraft are captured within a sixty-second recording.
"Silence" or ambient background noises at this location include; cars, motorbikes, light-trucks, garbage trucks, power-tools, lawn mowers, construction sounds, sirens, people talking, dogs barking and a wide range of Australian native birds (New Holland Honeyeaters, Wattlebirds, Australian Magpies, Australian Ravens, Spotted Doves, Rainbow Lorikeets and others).
Location: 1Situated approximately 500m south-east of the south-eastern end of the runway, this location is nearby recreational areas (golf course, skate park and parklands) with a busy road/highway inbetween the location and runway. This location features heavy winds and road traffic, as well as people talking, walking and riding, and also birds such as the Australian Magpie and Noisy Miner. The trigger distance for this location is set to 1km. Due to their low altitude aircraft are louder, but audible for a shorter time compared to "Location 0".
Location: 2As an alternative to "Location 1", this location is situated approximately 950m south-east of the end of the runway. This location has a wastewater facility to the north, a residential area to the south and a popular beach to the west. This location offers greater wind protection and further distance from airport and highway noises. Ambient background sounds feature close proximity cars and motorbikes, cyclists, people walking, nail guns and other construction sounds, as well as the local birds mentioned above.
Aircraft metadata
Supplementary "airframe" metadata for all aircraft has been gathered to help broaden the research possibilities from this dataset. Airframe information was collected and cross-checked from a number of open-source databases. The author has no reason to beleive any significant errors exist in the "aircraft_meta" files, however future versions of this dataset plan to obtain aircraft information directly from ICAO (International Civil Aviation Organization) to ensure a single, verifiable source of information.
Class/subclass ontology (minutes of recordings)
no aircraft (211) 0: no aircraft (211)
aircraft (533) 1: piston-propeller aeroplane (30) 2: turbine-propeller aeroplane (90) 3: turbine-fan aeroplane (409) 4: rotorcraft (4) The subclasses are a combination of the "airframe" and "engtype" features. Piston and Turboshaft rotorcraft/helicopters have been combined into a single subclass due to the small number of samples. Data splits
Audio recordings have been split into training (81%) and test (19%) sets. The training set has further been split into 5 folds, giving researchers a common split to perform 5-fold cross-validation to ensure reproducibility and comparable results. Data leakage into the test set has been avoided by ensuring recordings are disjointed from the training set by time and location - meaning samples in the test set for a particular location were recorded after any samples included in the training set for that particular location.
Labelled data
The entire dataset (training and test) is referenced and labelled in the "sample_meta.csv" file. Each row contains a reference to a unique recording, its meta information, annotations and airframe features.
Alternatively, these labels can be derived directly from the filename of the sample (see below). The "aircraft_meta.csv" and "aircraft_meta.json" files can be used to reference aircraft specific features - such as; manufacturer, engine type, ICAO type designator etc. (see "Columns/Labels" below for all features).
File naming convention
Audio samples are in WAV format, with some metadata stored in the filename.
Basic Convention
"Aircraft ID + Date + Time + Location ID + Microphone ID"
"XXXXXX_YYYY-MM-DD_hh-mm-ss_X_X"
Sample with aircraft
{hex_id} _ {date} _ {time} _ {location_id} _ {microphone_id} . {file_ext}
7C7CD0_2023-05-09_12-42-55_2_1.wav
Sample without aircraft
"Silence" files are denoted with six (6) leading zeros rather than an aircraft hex code. All relevant metadata for "silence" samples are contained in the audio filename, and again in the accompanying "sample_meta.csv"
000000 _ {date} _ {time} _ {location_id} _ {microphone_id} . {file_ext}
000000_2023-05-09_12-30-55_2_1.wav
Columns/Labels
(found in sample_meta.csv, aircraft_meta.csv/json files)
train-test: Train-test split (train, test)
fold: Digit from 1 to 5 splitting the training data 5 ways (else test)
filename: The filename of the audio recording
date: Date of the recording
time: Time of the recording
location: ID for the location of the recording
mic: ID of the microphone used
class: Top-level label for the recording (eg. 0 = No aircraft, 1 = Aircraft audible)
subclass: Subclass label for the recording (eg. 0 = No aircraft, 3 = Turbine-fan aeroplane)
altitude: Approximate altitude of the aircraft (in feet) at the start of the recording
hex_id: Unique ICAO 24-bit address for the aircraft recorded
session: Unique recording
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides detailed information on airline flight routes, fares, and passenger volumes within the United States from 1993 to 2024. The data includes metrics such as the origin and destination cities, distances between airports, the number of passengers, and fare information segmented by different airline carriers. It serves as a comprehensive resource for analyzing trends in air travel, pricing, and carrier competition over a span of three decades.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data that looks at how market structure affects delays for US domestic flights between the years 2004 - 2017.
Data on airline delays come from the Airline On-Time Performance Data (OTPD) from the US Bureau of Transportation Statistics. The data on tail numbers and seat capacity come from the Federal Aircraft Administration Aircraft Registry. The data on flight-related whether comes from the Local Climatological Data (LCD) provided by the National Center for Environmental Information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AeroSonic YPAD-0523: Labelled audio dataset for acoustic detection and classification of aircraft
Version 0.2 (June 2023)
Publication
If using this data in an academic work, please reference the DOI and version.
Description
AeroSonic:YPAD-0523 is a specialised dataset of ADS-B labelled audio clips for research in the fields of aircraft noise attribution and machine listening, particularly acoustic detection and classification of low-flying aircraft. Audio files in this dataset were recorded at locations in close proximity to a flight path approaching or departing Adelaide International Airport’s (ICAO code: YPAD) primary runway, 05/23. Recordings are initially labelled from radio (ADS-B) messages received from the aircraft overhead. Each recording is then human verified, and trimmed to the best (subjective) 20 seconds of audio in which the target aircraft is audible.
A total of 1,890 audio clips are balanced across two top-level classes, “Aircraft” (3.57 hours: 642 20-second recordings) and “Silence” (3.37 hours: 1,248 5 and 10-second recordings). The aircraft class is then further broken-down into four unbalanced subclasses which broadly describe an aircrafts structure and propulsion mechanism. A variety of additional "airframe" features are provided to give researchers finer control of the dataset, and the opportunity to develop ontologies specific to their own use case.
For convenience, the dataset has been split into training (6.28 hours) and testing (0.66 hours) subsets, with the training set further split into 10 folds for cross-validation. Care has been taken to ensure the class distribution for each subset and fold does not significantly deviate from the overall distribution.
Researchers may find applications for this dataset in a number of fields; particularly aircraft noise isolation and monitoring in an urban environment, development of passive acoustic systems to assist radar technology, and understanding the sources of aircraft noise to help manufacturers design less-noisy aircraft.
Audio data
ADS-B (Automatic Dependent Surveillance–Broadcast) messages transmitted directly from aircraft are used to automatically capture and label audio recordings. A 60-second recording is triggered when an aircraft transmits a message indicating it is within a specified distance of the recording device. The file is labelled with a unique ICAO identifier code for the aircraft, as well as its last recorded altitude, date and time. The recording is then human verified and trimmed to 20 seconds - with the aircraft audible for the duration of the clip.
A balanced collection of urban background noise without aircraft (silence) is included with the dataset as a means of distinguishing location specific environmental noises from aircraft noises. 10-second background noise, or “silence” recordings are triggered only when there are no aircraft broadcasting that they are within a specified distance of the recording device. These "silence" recordings are also human verified to ensure no aircraft noise is present. The dataset contains 1,180 10-second clips, and 68 5-second clips of silence/ambient background noise.
Location information
Recordings have been collected from three (3) locations. GPS coordinates for each location are provided in the "locations.json" file. In order to protect privacy, coordinates have been provided for a road or public space nearby the recording device instead of its exact location.
Location: 0
Situated in a suburban environment approximately 15.5km north-east of the start/end of the runway. For Adelaide, typical south-westerly winds bring most arriving aircraft past this location on approach. Winds from the north or east will cause aircraft to take-off to the north-east, however not all departing aircraft will maintain a course to trigger a recording at this location. The "trigger distance" for this location is set for 3km to ensure small/slower aircraft and large/faster aircraft are captured within a sixty-second recording.
"Silence" or ambient background noises at this location include; cars, motorbikes, light-trucks, garbage trucks, power-tools, lawn mowers, construction sounds, sirens, people talking, dogs barking and a wide range of Australian native birds (New Holland Honeyeaters, Wattlebirds, Australian Magpies, Australian Ravens, Spotted Doves, Rainbow Lorikeets and others).
Location: 1
Situated approximately 500m south-east of the south-eastern end of the runway, this location is nearby recreational areas (golf course, skate park and parklands) with a busy road/highway inbetween the location and runway. This location features heavy winds and road traffic, as well as people talking, walking and riding, and also birds such as the Australian Magpie and Noisy Miner. The trigger distance for this location is set to 1km. Due to their low altitude aircraft are louder, but audible for a shorter time compared to "Location 0".
Location: 2
As an alternative to "Location 1", this location is situated approximately 950m south-east of the end of the runway. This location has a wastewater facility to the north, a residential area to the south and a popular beach to the west. This location offers greater wind protection and further distance from airport and highway noises. Ambient background sounds feature close proximity cars and motorbikes, cyclists, people walking, nail guns and other construction sounds, as well as the local birds mentioned above.
Aircraft metadata
Supplementary "airframe" metadata for all aircraft has been gathered to help broaden the research possibilities from this dataset. Airframe information was collected and cross-checked from a number of open-source databases. The author has no reason to beleive any significant errors exist in the "aircraft_meta" files, however future versions of this dataset plan to obtain aircraft information directly from ICAO (International Civil Aviation Organization) to ensure a single, verifiable source of information.
Class/subclass ontology (minutes of recordings)
0. no aircraft (202)
0: no aircraft (202)
1. aircraft (214)
1: piston-propeller aeroplane (12)
2: turbine-propeller aeroplane (37)
3: turbine-fan aeroplane (163)
4: rotorcraft (1.6)
The subclasses are a combination of the "airframe" and "engtype" features. Piston and Turboshaft rotorcraft/helicopters have been combined into a single subclass due to the small number of samples.
Data splits
Audio recordings have been split into training (90.5%) and test (9.5%) sets. The training set has further been split into 10 folds, giving researchers a common split to perform 10-fold cross-validation - ensuring reproducibility and comparative results. Data leakage into the test set has been avoided by ensuring recordings are disjointed from the training set by time and location - meaning samples in the test set for a particular location were recorded after any samples included in the training set for that particular location.
Labelled data
The entire dataset (training and test) is referenced and labelled in the "sample_meta.csv" file. Each row contains a reference to a unique recording and all the labels and features associated with that recording and aircraft.
Alternatively, these labels can be derived directly from the filename of the sample (see below), plus a JSON file which accompanies each aircraft sample. The "aircraft_meta.csv" and "aircraft_meta.json" files can be used to reference aircraft specific features - such as; manufacturer, engine type, ICAO type designator etc. (see below for all 14 airframe features).
File naming convention
Audio samples are in WAV format, and metadata for aircraft recordings are stored in JSON files. Both files share the same name, only differing by their file extension.
Basic Convention
“Aircraft ID + Date + Time + Location ID + Microphone ID”
“XXXXXX_YYYY-MM-DD_hh-mm-ss_X_X”
Sample with aircraft
{hex_id} _ {date} _ {time} _ {location_id} _ {microphone_id} . {file_ext}
7C7CD0_2023-05-09_12-42-55_2_1.wav
7C7CD0_2023-05-09_12-42-55_2_1.json
Sample without aircraft
“Silence” files are denoted with six (6) leading zeros rather than an aircraft hex code. All relevant metadata for “silence” samples are contained in the audio filename, and again in the accompanying “sample_meta.csv”
000000 _ {date} _ {time} _ {location_id} _ {microphone_id} . {file_ext}
000000_2023-05-09_12-30-55_2_1.wav
Columns/Labels
(found in sample_meta.csv, aircraft_meta.csv/json and aircraft recording JSON files)
train-test: Train-test split (train, test)
fold: Digit from 0 to 9 splitting the training subset 10 ways (else test)
filename: The filename of the audio recording
date: Date of the recording
time: Time of the recording
duration: Length of the recording (in seconds)
location_id: ID for the location of the recording
microphone_id: ID of the microphone used
hex_id: Unique ICAO 24-bit address for the aircraft
This dataset contains Twin Otter Airplane Flight Level Data collected over Granite Mountain during the Mountain Terrain Atmospheric Modeling and Observations Field Experimental Component (MATERHORN-X) project. All data files are in comma separated value (CSV) format. The time stamps are based off of UTC time for all the instruments for the MATERHORN-X data, unless it was stated otherwise. This dataset is from the University of Virginia (UoV). Please refer to the readme for more information.
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
This dataset provides detailed information on flight arrivals and delays for U.S. airports, categorized by carriers. The data includes metrics such as the number of arriving flights, delays over 15 minutes, cancellation and diversion counts, and the breakdown of delays attributed to carriers, weather, NAS (National Airspace System), security, and late aircraft arrivals. Explore and analyze the performance of different carriers at various airports during this period. Use this dataset to gain insights into the factors contributing to delays in the aviation industry.
Purpose: The purpose of this dataset is to offer insights into the performance of U.S. carriers at various airports during August 2013 - August 2023, focusing on flight arrivals and delays. By providing detailed information on key metrics such as the number of arriving flights, delays over 15 minutes, cancellations, and diversions, the dataset aims to facilitate analyses of factors contributing to delays, including those attributed to carriers, weather, the National Airspace System (NAS), security, and late aircraft arrivals. Researchers, data scientists, and aviation enthusiasts can leverage this dataset to explore patterns, identify trends, and draw conclusions that contribute to a better understanding of the aviation industry's operational challenges.
Structure: The dataset is structured as a tabular format with rows representing unique combinations of year, month, carrier, and airport. Each row contains information on various metrics, including flight counts, delay counts, cancellation and diversion counts, and delay breakdowns by different factors. The columns provide specific details such as carrier codes and names, airport codes and names, and counts of delays attributed to carrier, weather, NAS, security, and late aircraft arrivals. The structured format ensures that users can easily query, analyze, and visualize the data to derive meaningful insights.
Usage: Researchers, analysts, and data enthusiasts can utilize this dataset for a variety of purposes, including but not limited to:
Performance Analysis: Assess the on-time performance of different carriers at specific airports and identify potential areas for improvement.
Trend Identification: Analyze temporal trends in delays, cancellations, and diversions to understand whether certain months or periods exhibit higher operational challenges.
Root Cause Analysis: Investigate the primary contributors to delays, such as carrier-related issues, weather conditions, NAS inefficiencies, security concerns, or late aircraft arrivals.
Benchmarking: Compare the performance of various carriers across different airports to identify industry leaders and areas requiring attention.
Predictive Modeling: Use historical data to develop predictive models for flight delays, aiding in the development of strategies to mitigate disruptions.
Industry Insights: Contribute to a broader understanding of the factors influencing operational efficiency within the U.S. aviation sector.
As users explore and analyze the dataset, they can gain valuable insights that may inform decision-making processes, improve operational strategies, and contribute to a more efficient and reliable air travel experience.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
It is the raw data of the worst plane crashes in human history based on the fatalities. May their souls rest in peace.
beautifulSoup was used to scrape the raw data. Preprocess accordingly.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Original data:https://doi.org/10.7910/DVN/HG7NV7This data has been rearranged and converted in parquet.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains production data about most of the commercial/military aircraft ever produced, namely: * Number of units produced per model * Start / End dates of production * Retirement year (in case of military and some commercial aircraft).
Data was extracted from DBPedia and exported as a CSV for the ease of use in notebooks.
I was taking a look at this kernel and thought about including more data about the aircrafts.
Motivation
The data in this dataset is derived and cleaned from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network's more than 2500 members since 1 January 2019. More data has been periodically included in the dataset until the end of the COVID-19 pandemic.
We stopped updating the dataset after December 2022. Previous files have been fixed after a thorough sanity check.
License
See LICENSE.txt
Disclaimer
The data provided in the files is provided as is. Despite our best efforts at filtering out potential issues, some information could be erroneous.
Origin and destination airports are computed online based on the ADS-B trajectories on approach/takeoff: no crosschecking with external sources of data has been conducted. Fields origin or destination are empty when no airport could be found.
Aircraft information come from the OpenSky aircraft database. Fields typecode and registration are empty when the aircraft is not present in the database.
Description of the dataset
One file per month is provided as a csv file with the following features:
callsign: the identifier of the flight displayed on ATC screens (usually the first three letters are reserved for an airline: AFR for Air France, DLH for Lufthansa, etc.)
number: the commercial number of the flight, when available (the matching with the callsign comes from public open API); this field may not be very reliable;
icao24: the transponder unique identification number;
registration: the aircraft tail number (when available);
typecode: the aircraft model type (when available);
origin: a four letter code for the origin airport of the flight (when available);
destination: a four letter code for the destination airport of the flight (when available);
firstseen: the UTC timestamp of the first message received by the OpenSky Network;
lastseen: the UTC timestamp of the last message received by the OpenSky Network;
day: the UTC day of the last message received by the OpenSky Network;
latitude_1, longitude_1, altitude_1: the first detected position of the aircraft;
latitude_2, longitude_2, altitude_2: the last detected position of the aircraft.
Examples
Possible visualisations and a more detailed description of the data are available at the following page:
Credit
If you use this dataset, please cite:
Martin Strohmeier, Xavier Olive, Jannis Lübbe, Matthias Schäfer, and Vincent Lenders "Crowdsourced air traffic data from the OpenSky Network 2019–2020" Earth System Science Data 13(2), 2021 https://doi.org/10.5194/essd-13-357-2021
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Geospatial Dataset of GNSS Anomalies and Political Violence Events
Overview
The Geospatial Dataset of GNSS Anomalies and Political Violence Events is a collection of data that integrates aircraft flight information, GNSS (Global Navigation Satellite System) anomalies, and political violence events from the ACLED (Armed Conflict Location & Event Data Project) database.
Dataset Files
The dataset consists of three CSV files:
Data Fields: Daily_GNSS_Anomalies_and_ACLED-2023-V1.csv and Daily_GNSS_Anomalies_and_ACLED-2023-V2.csv
Data Fields: Monthly_GNSS_Anomalies_and_ACLED-2023-V9.csv
The file contains monthly aggregated GNSS anomaly and ACLED event data per grid cell. The structure and meaning of each field are detailed below:
Data Sources
Temporal and Spatial Coverage
This data contains the flight plans for the NSF/NCAR HIAPER Gulfstream V (GV) aircraft flown during the O2/N2 Ratio and CO2 Airborne Southern Ocean (ORCAS) Study. Data covers two test flights and 13 research flights between 5 January and 29 February 2016. The data is comprised of text, csv, kml, png, and html files. It also includes an R-based flight plan tool for viewing the data, with associated instructions and configuration files.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Air accidents are extremely rare, especially in recent years. One, in tragedy situations, we often manage to obtain a record of various flight parameters as well as words spoken in the cockpit just before the disaster. The database presents the record of the last sentence spoken by the plane's drivers before the crash, out of 82 disasters for which such results were obtained.
The data was obtained using webscraping. The Python (version 3.10) language with the "BeautifulSoup", "requests", "re" and "pandas" packages was used for this process and "SelectorGadet" add-on, which made the work with the site easier. Each line in the database refers to one crush. The data was downloaded from planecrashinfo.com, which aggregates various types of information on air accidents from various sources. The database contains 4 columns that contain information on: date of the incident, airlane, flight number, if available, and a record of the last words.
Photo by Douglas Bagg on Unsplash
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the automated drone flight paths for the vegetation area of the Evans et al. Sawyer Mill dam removal study in Dover, New Hampshire, USA. These flight paths are specific to the reservoir response manuscript. A DJI Phantom 3 Professional drone with its original RGB camera equipped with a polarizing filter was used for the study.All flight paths are available as csv files that were exported out of the Litchi mission hub. The pdf file contains screenshots of the paths in DJIFlightPlanner and Litchi. For information on the Litchi flight app, please see their website here: https://flylitchi.com/. The same set of flight paths were flown in both 2019 and 2020 to keep imagery collection consistent. Nadir flight paths had 80% side image overlap and 90% forward image overlap set in DJIFlightPlanner. The imagery angle for the angled flight paths was set to 20 degrees off nadir (-70 gimbal pitch in Litchi), and the paths of the angled flights were manually drawn in Litchi to supplement the nadir flight paths that were designed in DJIFlightPlanner.For the vegetation area, the altitude of the drone was set to 150 feet above ground level. The angled flight paths had to be traced and executed manually due to an error in Litchi (“waypoint distance too close”). This was not fixed for the 2020 flight date, so the angled flights were manually flown in 2020, as well.These materials were made using resources from an NSF EPSCoR funded project “RII Track-2 FEC: Strengthening the scientific basis for decision-making about dams: Multi-scale, coupled-systems research on ecological, social, and economic trade-offs” (a.k.a. "Future of Dams"). Support for this project is provided by the National Science Foundation’s Research Infrastructure Improvement NSF #IIA-1539071. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Aerial insects are exceptionally agile and precise owing to their small size and fast neuromotor control. They perform impressive acrobatic maneuvers when they evade predators, recover from wind gust, or land on moving objects. Flapping-wing propulsion is advantageous for achieving flight agility because it can generate large changes of instantaneous forces and torques. During flapping-wing flight, the wings, hinges, and tendons of pterygote insects endure large deformation and high stress hundreds of times each second, highlighting the outstanding flexibility and fatigue resistance of biological structures and materials. In comparison, engineered materials and microscale structures in sub-gram micro-aerial-vehicles (MAVs) exhibit substantially shorter lifespan. Consequently, most sub-gram MAVs are limited to hovering for less than 10 seconds or following simple trajectories at slow speeds. Here, we developed a 750-milligram flapping-wing MAV that demonstrated unprecedented lifespan, sp..., The dataset comprises raw data, including position and Euler angles (using the XYZ convention), collected from a motion-capturing system (Vicon Vantage V5 and Vicon Tracker 3.9). The data was retrieved from Vicon Tracker 3.9 and transmitted in real-time to a target computer (Speedgoat) via asynchronous UDP. All data was saved at 10 kHz on the target computer, with no post-processing applied., , # Flight data from: Acrobatics at the insect-scale: a durable, precise, and agile micro-aerial-robot
The data is saved in Comma-Separated Value (.csv) format. The first column of each .csv file represents the time (in seconds) recorded during the flight. The subsequent columns are organized in groups of six: the first three columns show the x, y, and z positions (in meters), and the next three columns contain the Euler angles in the XYZ convention (in radians). The corresponding flight numbers are also included in the column names to demonstrate repeatability.
The following list shows the filenames and the corresponding flights (in terms of figure numbers) presented in the manuscript:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Airport AEF values. This CSV file gives the AEF of the airports as calculated and used in the current study. Airports are indexed by IATA code, and also by city and country. AEF values are normalized to the range 0,100. (CSV 131 kb)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Airline data holds immense importance as it offers insights into the functioning and efficiency of the aviation industry. It provides valuable information about flight routes, schedules, passenger demographics, and preferences, which airlines can leverage to optimize their operations and enhance customer experiences. By analyzing data on delays, cancellations, and on-time performance, airlines can identify trends and implement strategies to improve punctuality and mitigate disruptions. Moreover, regulatory bodies and policymakers rely on this data to ensure safety standards, enforce regulations, and make informed decisions regarding aviation policies. Researchers and analysts use airline data to study market trends, assess environmental impacts, and develop strategies for sustainable growth within the industry. In essence, airline data serves as a foundation for informed decision-making, operational efficiency, and the overall advancement of the aviation sector.
This dataset comprises diverse parameters relating to airline operations on a global scale. The dataset prominently incorporates fields such as Passenger ID, First Name, Last Name, Gender, Age, Nationality, Airport Name, Airport Country Code, Country Name, Airport Continent, Continents, Departure Date, Arrival Airport, Pilot Name, and Flight Status. These columns collectively provide comprehensive insights into passenger demographics, travel details, flight routes, crew information, and flight statuses. Researchers and industry experts can leverage this dataset to analyze trends in passenger behavior, optimize travel experiences, evaluate pilot performance, and enhance overall flight operations.
https://i.imgur.com/cUFuMeU.png" alt="">
The dataset provided here is a simulated example and was generated using the online platform found at Mockaroo. This web-based tool offers a service that enables the creation of customizable Synthetic datasets that closely resemble real data. It is primarily intended for use by developers, testers, and data experts who require sample data for a range of uses, including testing databases, filling applications with demonstration data, and crafting lifelike illustrations for presentations and tutorials. To explore further details, you can visit their website.
Cover Photo by: Kevin Woblick on Unsplash
Thumbnail by: Airplane icons created by Freepik - Flaticon