9 datasets found
  1. Aeroplane Crash Data from 1919 to 2025

    • kaggle.com
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Koshti (2025). Aeroplane Crash Data from 1919 to 2025 [Dataset]. https://www.kaggle.com/datasets/atharvakoshti/aeroplane-crash-data-from-1919-to-2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Atharva Koshti
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🛫 Airplane Crash Data (1919–2025) – Cleaned & Unified 📌 Overview This dataset is a comprehensive and manually curated collection of global aviation accidents and incidents from 1919 to 2025, sourced from five authoritative platforms. It combines historical and modern records into a single, clean, and analysis-ready .csv file — ideal for data science, machine learning, and aviation safety research.

    📂 Sources Used The raw data was gathered from the following sources:

    Each source had unique attributes, structures, and formats. I manually extracted, cleaned, de-duplicated, and unified the datasets to generate this high-quality final version.

    🧹 Data Cleaning & Curation The dataset preparation involved:

    🧭 Date standardization across multiple formats (including parsing old historical dates)

    🔍 Duplicate removal from overlapping sources

    🛬 Location normalization (city, country, coordinates where possible)

    📉 Fatality/injury counts harmonized into consistent columns

    🧑‍✈️ Flight purpose categorization (commercial, military, training, etc.)

    💥 Cause/description refinement to improve textual analysis usability

    🏷️ Tagging & classification based on incident severity, aircraft type, etc.

    📊 Columns in cleaned_data.csv(this is combination of all databased ,ready to work on) Below is a typical structure of the dataset:

    Column Name Description Date :Date of the incident Location :City/Region/Country of the crash Operator :Airline or aircraft operator Flight No :Flight number (if available) Aircraft Type :Type/model of the aircraft Registration :Aircraft registration number Fatalities :Total number of fatalities Aboard :Total number of people on board Ground Fatalities :Number of people killed on the ground (if any) Summary :Short description or probable cause Source :Original source from which the data point was collected Crash Type :Categorized tag: e.g., Mid-air collision, engine failure, pilot error, etc. Year :Extracted year (useful for trend analysis)

    Note: Not all columns are present in each original file; where possible, missing data has been filled or marked appropriately.

    🔍 Why This Dataset Is Unique 📅 Over a century of aviation data (1919–2025)

    🔄 Merged from five reputable sources

    🧼 Thorough manual cleaning and validation

    📚 Useful for:

    Aviation safety analysis

    Time-series forecasting

    Natural Language Processing (NLP) on crash summaries

    Machine learning (e.g., predicting crash causes or fatalities)

    📌 Suggested Use Cases ✈️ Predictive modeling of aviation risk

    📉 Trend analysis in global air safety

    🗺️ Geographic visualization of accident hotspots

    🤖 NLP classification of crash summaries

    📊 Dashboard creation in Power BI or Tableau

    📁 File Included cleaned_data.csv – Final cleaned dataset with unified schema

  2. Bird Strikes in Aviation: Aircraft Collisions

    • kaggle.com
    Updated Nov 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tapendu Karmakar (2024). Bird Strikes in Aviation: Aircraft Collisions [Dataset]. https://www.kaggle.com/datasets/iamtapendu/bird-strike-by-aircafts-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2024
    Dataset provided by
    Kaggle
    Authors
    Tapendu Karmakar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Transport and communication are vital domains within the field of analytics, particularly in addressing safety and environmental concerns linked to the rapid growth of urban areas and increasing air traffic. Among the many risks aviation faces, bird strikes—collisions between aircraft and birds or other wildlife—pose a significant threat. These strikes can cause serious damage to aircraft, particularly jet engines, and have been responsible for some fatal accidents. Bird strikes are most likely to occur during critical flight phases such as take-off, climb, approach, and landing, when aircraft are at lower altitudes and bird activity is higher.

    The dataset provided by the FAA, covering incidents from 2000 to 2011, offers a comprehensive overview of bird strikes in the U.S. It includes detailed visualizations and analyses across several key areas:

    • Trends Over Time: Yearly distribution of bird strike incidents.
    • Airline Impact: Analysis of the top 10 U.S. airlines affected by bird strikes.
    • Airport Incidents: Identification of the 50 U.S. airports with the highest frequency of bird strike incidents.
    • Economic Impact: Yearly costs incurred by airlines and the aviation industry due to bird strikes.
    • Timing and Altitude: When and at what altitude most bird strikes occur.
    • Flight Phase: The phase of flight during which strikes are most likely to happen.
    • Impact Analysis: How bird strikes affect flight operations, including aircraft damage.
    • Pilot Awareness: Correlation between pilot knowledge of potential bird strike risks and the severity of the incidents.

    This dataset offers valuable insights into bird strike patterns, focusing on factors such as aircraft type, location, flight phase, and the specific species involved. By analyzing these variables, it helps identify risk factors and trends, supporting the development of strategies to reduce the frequency and impact of bird strikes, ultimately enhancing aviation safety and risk mitigation.

    Features:

    • AircraftType: The type of aircraft involved in the bird strike incident (e.g., "Airplane").
    • AirportName: The name of the airport where the bird strike occurred (e.g., "LAGUARDIA NY", "DALLAS/FORT WORTH INTL ARPT").
    • AltitudeBin: The altitude range (in feet) at which the bird strike occurred, divided into bins (e.g., "(1000, 2000]", "(30, 50]").
    • MakeModel: The specific make and model of the aircraft involved (e.g., "B-737-400", "MD-80", "A-300").
    • NumberStruck: The number of birds that were struck during the incident (e.g., "Over 100", "1", "26").
    • NumberStruckActual: The actual number of birds that were struck during the incident (e.g., 859, 424, 261).
    • Effect: The effect of the bird strike on the aircraft, indicating whether it caused any damage or not (e.g., "Engine Shut Down", "No damage", "Caused damage").
    • FlightDate: The date of the bird strike incident (e.g., "11/23/00 0:00").
    • Damage: A description of the damage caused by the bird strike (e.g., "Caused damage", "No damage").
    • Engines: The number of engines on the aircraft involved in the bird strike (e.g., 2 engines).
    • Operator: The airline or operator of the aircraft involved in the bird strike (e.g., "US AIRWAYS", "AMERICAN AIRLINES", "ALASKA AIRLINES").
    • OriginState: The U.S. state where the aircraft originated (e.g., "New York", "Texas", "Washington").
    • FlightPhase: The phase of flight during which the bird strike occurred (e.g., "Climb", "Landing Roll", "Approach", "Take-off run")
    • ConditionsPrecipitation: The weather condition related to precipitation at the time of the bird strike (e.g., "None", "Some Cloud").
    • RemainsCollected?: Indicates whether bird remains were collected after the strike (e.g., "True" or "False").
    • RemainsSentToSmithsonian: Indicates whether the bird remains were sent to the Smithsonian Institution for study (e.g., "True" or "False").
    • Remarks: Additional comments or notes related to the incident, including specific details like the number of birds involved, actions taken, or other observations (e.g., "FLYING UNDER A VERY LARGE FLOCK OF BIRDS", "BIRD REMAINS ON F/O WINDSCREEN").
    • WildlifeSize: The size of the bird or wildlife involved in the strike (e.g., "Small", "Medium").
    • ConditionsSky: The sky condition at the time of the bird strike (e.g., "No Cloud", "Some Cloud").
    • WildlifeSpecies: The species of the bird or wildlife involved in the strike (e.g., "European starling", "Rock pigeon", "Unknown bird - medium").
    • PilotWarned: Indicates whether the pilot was warned about the potential for a bird strike (e.g., "Y" for Yes, "N" for No).
    • Cost: The cost incurred as a result of the bird strike (e.g., financial cost to repair damage or related expenses, usually in monetary value like 30,736).
    • Altitude: The specific alt...
  3. Aviation Safety Reports Text Classification

    • kaggle.com
    zip
    Updated Jan 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Aviation Safety Reports Text Classification [Dataset]. https://www.kaggle.com/datasets/thedevastator/aviation-safety-reports-text-mining-classificati/discussion
    Explore at:
    zip(12071571 bytes)Available download formats
    Dataset updated
    Jan 7, 2023
    Authors
    The Devastator
    Description

    Aviation Safety Reports Text Classification

    Using Reports to Discover Incidents and Problem Types

    By US Open Data Portal, data.gov [source]

    About this dataset

    This U.S. Government Works Aviation Safety Reports Dataset for Text Mining is part of the SIAM 2007 Text Mining Competition dataset which has been used to create algorithms to classify documents according to the types of problems described. The documents in this dataset consist of reports on incidents that occurred during certain flights and are collected from human-generated reports as part of the Aviation Safety Reporting System (ASRS). The files for this competition come in raw text format, with each row representing a single document and its associated problem type label.

    This dataset provides invaluable insights into aviation safety incidents and is an excellent resource for researchers interested in developing text mining techniques for categorizing documents by their contents. Analyzing these documents can help identify potential safety issues, both within individual aircrafts’ operations and more broadly online, driving domestic flying safety forward in an era when ever increasing numbers of people are travelling by air

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains aviation safety reports which have been labelled according to the type of problem that occurred during a certain flight. It is a great resource for developing text mining algorithms for document classification.

    Research Ideas

    • Build an AI-powered Machine Learning classifier to identify problematic aviation incidents more quickly and accurately.
    • Predict the risk of a particular flight, taking into consideration the type of incident that has occurred before on a similar flight.
    • Construct an interactive searchable interface to allow users to better analyze and visualize aviation safety reports in order to uncover trends and suggest ways for improvement across all levels of relevant stakeholders within the sector, such as regulators, airlines, aircraft operators or pilots

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: testtruth-csv-gz-3.csv | Column name | Description | |:--------------|:------------------------------------| | -1 | Document Number (String) | | -1.1 | Aircraft Autopilot Problem (String) | | -1.2 | Auxiliary Power Problem (String) | | -1.3 | Avionics Problem (String) | | -1.4 | Cabin Pressure Problem (String) | | -1.5 | Communications Problem (String) | | -1.6 | Electrical System Problem (String) | | -1.7 | Engine Problem (String) | | -1.8 | Fire/Smoke Problem (String) | | -1.9 | Fuel System Problem (String) | | -1.10 | Ground Service Problem (String) | | -1.11 | Hydraulic System Problem (String) | | -1.12 | Ice/Frost Problem (String) | | -1.13 | Landing Gear Problem (String) | | -1.14 | Maintenance Problem (String) | | -1.15 | Navigation Problem (String) | | -1.16 | Oxygen System Problem (String) | | -1.17 | Structural Problem (String) | | -1.18 | Other Problem (String) |

    File: traincategorymatrix-csv-gz-5.csv | Column name | Description | |:--------------|:------------------------------------| | -1 | Document Number (String) | | -1.1 | Aircraft Autopilot Problem (String) | | -1.2 | Auxiliary Power Problem (String) | | -1.3 | Avionics Problem (String) | | -1.4 | Cabin Pressure Problem (String) | | -1.5 | Communications Problem (String) | | -1.6 | Electrical System Problem (String) | | -1.7 | Engine Problem (String) | | -1.8 | Fire/Smoke Problem (String) | | -1.9 | Fuel System Problem (String) | | -1.10 | Ground Service Problem (String) | | -1.11 | Hydraulic System Problem (String) | | -1.12 | Ice/Frost Problem (String) | | -1.13 | Landing Gear Problem (String) | | -1.14 | Maintenance Problem (String) | | -1.15 | Navigation Problem (String) | | -1.16 | Oxyg...

  4. ntsb-aviation-accidents

    • kaggle.com
    zip
    Updated Oct 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirza Niaz Morshed (2025). ntsb-aviation-accidents [Dataset]. https://www.kaggle.com/datasets/mirzaniazmorshed/ntsb-aviation-accidents
    Explore at:
    zip(132661469 bytes)Available download formats
    Dataset updated
    Oct 24, 2025
    Authors
    Mirza Niaz Morshed
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides detailed records of U.S. civil aviation accidents and incidents investigated by the National Transportation Safety Board (NTSB) from 1982 to the present. Converted from official NTSB public records, it includes structured CSV tables covering accident circumstances, aircraft and flight details, personnel information, outcomes, and probable cause determinations.

    Contents:

    Multiple CSV files (one per data table) such as: accidents, aircraft, flight crew, events, injuries, and more.

    Each table is linked by a unique event identifier (ev_id), enabling relational analysis across files.

    Key fields: event date and location, aircraft type, operator, weather conditions, phase of operation, damage extent, injury counts, and full “Probable Cause” narratives.

    Use Cases:

    Aviation safety analysis and risk modeling

    Root cause and human factors studies

    Training and benchmarking machine learning models on real-world accident data

    Data visualization and exploratory research

    Data Source and Provenance:

    Source: U.S. National Transportation Safety Board (NTSB)

    Original dataset (“avall.zip”/“avall.mdb”) publicly distributed by the NTSB and converted to CSV for accessibility

    Time Coverage: January 1982 – Present (updated annually by the NTSB)

    License: Public US government data – most suitable as CC0: Public Domain (verify for your purposes)

    Notes:

    Some records may have missing or redacted fields due to ongoing investigations or privacy protection.

    Column-level documentation is available in the data explorer for each CSV file.

    For full documentation of fields and code definitions, refer to the included data dictionaries or the NTSB website.

    Typical Applications:

    Accident rate trends by aircraft type or operator

    Statistical or machine learning classification of accident causes

    Visualization of accident distribution by geography, time, or weather conditions

    Please cite the NTSB as the original data source in all derived works.

  5. Dataset: 2023 GPS Anomalies, NOTAMs, and Aircraft Traffic

    • zenodo.org
    zip
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eugene Pik; Eugene Pik (2024). Dataset: 2023 GPS Anomalies, NOTAMs, and Aircraft Traffic [Dataset]. http://doi.org/10.5281/zenodo.11420433
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 14, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eugene Pik; Eugene Pik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset: 2023 GPS Anomalies, NOTAMs, and Aircraft Traffic

    The dataset "2023 GPS Anomalies, NOTAMs, and Aircraft Traffic" was collected and generated for the paper "Detecting GPS Anomalies in Aviation Using ADS-B: Correlating Coordinate Gaps and GPS Deviations with NOTAM Warnings."

    This dataset provides a collection of geospatial and temporal data necessary for analyzing potential GPS anomalies in aviation. The data sources include NOTAMs received from the FAA, and the aircraft traffic and GPS information calculated and extracted from the OpenSky Trino ADS-B database.

    The FAA_and_ICAO_locations file includes 21,382 records with identifiers, coordinates, and detailed facility information. This dataset serves as a reference for analyzing the geographical distribution of aviation facilities. The Flights_per_Hour_per_Grid file, with 74,219,036 records, provides hourly flight movement counts within specified grids, offering insights into air traffic patterns and potential disruptions. The GPS_Jumps_from_Routes file, comprising 5,878,275 records, documents deviations in flight paths, capturing metrics such as distances, speeds, and timestamps. This data is crucial for identifying potential GPS spoofing incidents by analyzing unusual jumps between consecutive data points.

    The GPS_Missing_Coordinates file, with 53,232 records, highlights periods of missing GPS signals, indicating possible GPS jamming events. This file includes start and end times, distances between known coordinates, and Navigation Integrity Category (NIC) values to assess data quality during null periods. The NOTAM_ICAO_GPS and NOTAM_USA files, with 30,160 and 234,205 records respectively, provide detailed information on NOTAM areas, including geographic areas, active periods, and categories. This allows for an analysis of the spatial and temporal correlation between NOTAM warnings and GPS anomalies, facilitating a better understanding of the impact of GPS disruptions on aviation safety and operations.

    Summary Table

    Category

    File Names

    Total Records

    Columns

    FAA and ICAO Locations

    FAA_and_ICAO_locations.csv

    FAA_and_ICAO_locations.dpkg

    21,382

    WKT, id, fid, Location_ID, ICAO_ID, IATA_ID, FAA_Location_Code, Facility_Type, Facility_Name, FAA_New_Location_Code, Coordinates, lat, lon, Region, Country_Code, Country, State_Id, State_Name, City, Location, Effective_Date, Site_Id, ADO, ARTCC_Id, ARTCC_Computer_ID, ARTCC_Name, Tie_In_FSS_Id, Tie_In_FSS_Name, NOTAM_Facility_Id, NOTAM_Service

    Flights per Hour per Grid

    Flights_per_Hour_per_Grid-2023.csv

    Flights_per_Hour_per_Grid-2023.dpkg

    74,219,036

    grid_id, hour, movement_count, geometry

    GPS Jumps from Routes

    (possible spoofing)

    GPS_Jumps_from_Routes-2023.csv

    GPS_Jumps_from_Routes-2023.dpkg

    5,878,275

    WKT, id, fid, icao24, callsign, time_before_spoofing, time_of_spoofing, distance, time_difference, speed_m_s, time_start, time_end

    GPS Missing Coordinates

    (possible jamming)

    GPS_Missing_Coordinates-2023.csv

    GPS_Missing_Coordinates-2023.dpkg

    53,232

    WKT, id, icao24, callsign, null_start_time, null_end_time, time_of_previous_not_null_coords, time_of_next_not_null_coords, between_coords_distance_m, null_duration_seconds, between_coords_duration_seconds, avg_nic, min_nic, max_nic, start_time, end_time, start_y, end_x, end_y, start_x

    NOTAM ICAO GPS

    NOTAM_ICAO_GPS-2023.csv

    NOTAM_ICAO_GPS-2023.dpkg

    30,160

    WKT, id, fid, notam_id, category_name, coordinates_center, radius_nm, radius_mod_nm, notam_number, accountability, location_id, icao_id, domestic_text, icao_text, type, category_id, time_start, time_end

    NOTAM USA

    NOTAM_USA-2023.csv

    NOTAM_USA-2023.dpkg

    234,205

    WKT, id, fid, notam_id, category_name, is_circle, coordinates_polygon, coordinates_center, radius_nm, faa_location_code, is_faa_location, location_id, is_restricted_area, restricted_area_id, restricted_area_code, category_id, message, notam_number, notam_accountability, moa, type, time_start, time_end

    Details

    1. FAA_and_ICAO_locations.csv and FAA_and_ICAO_locations.dpkg

    • Total Records: 21,382
    • Columns:
      • WKT: Well-Known Text representation of a point in the CSV file, or a geometry field in the DPKG file.
      • id: Unique identifier for each record.
      • fid: Feature identifier.
      • Location_ID: Identifier for the location.
      • ICAO_ID: ICAO (International Civil Aviation Organization) identifier.
      • IATA_ID: IATA (International Air Transport Association) identifier.
      • FAA_Location_Code: FAA location code.
      • Facility_Type: Type of facility (e.g., airport, heliport).
      • Facility_Name: Name of the facility.
      • FAA_New_Location_Code: New location code by FAA.
      • Coordinates: Coordinates of the location.
      • lat: Latitude of the location.
      • lon: Longitude of the location.
      • Region: Geographical region of the location.
      • Country_Code: Country code of the location.
      • Country: Country name of the location.
      • State_Id: State identifier.
      • State_Name: Name of the state.
      • City: City name.
      • Location: General location information.
      • Effective_Date: Effective date of the record.
      • Site_Id: Site identifier.
      • ADO: Airport District Office.
      • ARTCC_Id: ARTCC (Air Route Traffic Control Center) identifier.
      • ARTCC_Computer_ID: ARTCC computer identifier.
      • ARTCC_Name: Name of the ARTCC.
      • Tie_In_FSS_Id: Tie-in Flight Service Station identifier.
      • Tie_In_FSS_Name: Name of the Tie-in Flight Service Station.
      • NOTAM_Facility_Id: NOTAM (Notice to Airmen) facility identifier.
      • NOTAM_Service: Indicates if NOTAM service is available (Y/N).

    2. Flights_per_Hour_per_Grid-2023.csv and Flights_per_Hour_per_Grid-2023.dpkg

    • Total Records: 74,219,036
    • Columns:
      • grid_id: Identifier for the grid.
      • hour: Timestamp for the hour.
      • movement_count: Number of flights in each 0.5x0.5 degree grid during each hour of year 2023.
      • geometry: Well-Known Text representation of a polygon in the CSV file, or a geometry field in the DPKG file.

    3. GPS_Jumps_from_Routes-2023.csv and GPS_Jumps_from_Routes-2023.dpkg

    • Total Records: 5,878,275
    • Columns:
      • WKT: Well-Known Text representation of a linestring in the CSV file, or a geometry field in the DPKG file.
      • id: Unique identifier for each record.
      • fid: Feature identifier.
      • icao24: ICAO 24-bit aircraft address.
      • callsign: Callsign of the aircraft.
      • time_before_spoofing: Timestamp before the spoofing event.
      • time_of_spoofing: Timestamp of the spoofing event.
      • distance: Distance of the jump in meters.
      • time_difference: Time difference between two coordinates in seconds.
      • speed_m_s: Speed in meters per second.
      • time_start: Start time of the record.
      • time_end: End time of the record.

    4. GPS_Missing_Coordinates-2023.csv and GPS_Missing_Coordinates-2023.dpkg

    • Total Records: 53,232
    • Columns:
      • WKT: Well-Known Text representation of a linestring in the CSV file, or a geometry field in the DPKG file.
      • id: Unique identifier for each record.
      • icao24: ICAO 24-bit aircraft address.
      • callsign: Callsign of the aircraft.
      • null_start_time: Start time of missing GPS coordinates.
      • null_end_time: End time of missing GPS coordinates.
      • time_of_previous_not_null_coords: Time of the last known good GPS coordinates before the null period.
      • time_of_next_not_null_coords: Time of the first known good GPS coordinates after the null

  6. Airplane Crashes and Fatalities upto 2023

    • kaggle.com
    zip
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nayan Subedi (2023). Airplane Crashes and Fatalities upto 2023 [Dataset]. https://www.kaggle.com/datasets/nayansubedi1/airplane-crashes-and-fatalities-upto-2023
    Explore at:
    zip(638766 bytes)Available download formats
    Dataset updated
    Dec 30, 2023
    Authors
    Nayan Subedi
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Description:

    Explore a comprehensive dataset detailing the history of airplane crashes and fatalities worldwide from 1908 to 2023. This dataset encapsulates invaluable information for researchers, aviation enthusiasts, and safety experts interested in understanding the dynamics, trends, and patterns of aviation incidents over more than a century.

    Content:

    This dataset contains meticulously curated information on various aspects of airplane crashes, including but not limited to:

    • Date and time of the incident
    • Location (country, city )
    • Airline and flight number
    • Aircraft type and registration
    • Summary of the incident
    • Number of passengers and crew onboard
    • Fatalities among passengers, crew, and ground personnel
    • Probable cause(s) of the crash

    Insights and Analysis:

    Through this dataset, delve into the trends and patterns that have shaped aviation safety and industry regulations over time. Analyze factors such as geographical distribution, aircraft models, airlines involved, weather conditions, and potential causes of crashes to derive valuable insights and contribute to enhanced safety protocols and risk management strategies.

    Applications:

    Safety Improvement: Identify recurring factors contributing to crashes and propose preventive measures. Regulatory Enhancements: Inform policymakers and regulatory bodies to enact measures that ensure safer skies. Research and Education: Enable academic research and educational resources to study aviation safety comprehensively.

    Acknowledgments:

    PlaneCrashInfo (https://www.planecrashinfo.com): A vital resource in the documentation and dissemination of aviation incident information, contributing significantly to the wealth of knowledge available for this dataset.

    This dataset owes its foundation to the invaluable work of CGurkan (https://www.kaggle.com/datasets/cgurkan/airplane-crash-data-since-1908) for their initial compilation of airplane crash data since 1908 to 2019. We extend our sincere gratitude for their efforts in creating the groundwork for this comprehensive dataset.

    Furthermore, significant updates and enhancements have been made to the original dataset, incorporating additional information, refining existing records, and ensuring the dataset's accuracy and relevance over time. We're deeply appreciative of the collaborative spirit that underlies the improvement and evolution of this dataset.

    We also acknowledge the contributions of various entities, including aviation safety organizations, researchers, data providers, and community members, whose collective dedication has enriched and expanded this dataset, making it a more robust resource for studying aviation incidents and promoting safety measures.

    Your ongoing support and contributions are instrumental in advancing the understanding and improvement of aviation safety worldwide.

    Usage Policy:

    This dataset is provided for educational and analytical purposes. Users are encouraged to attribute the data source appropriately and contribute responsibly to the aviation safety domain.

    Feedback and Collaboration:

    We welcome feedback, contributions, and collaborations from the community to enhance the dataset's accuracy, completeness, and utility. Together, let's strive to make aviation safer for everyone.

  7. z

    2023 GPS Anomalies, NOTAMs, and Aircraft Traffic

    • zenodo.org
    zip
    Updated Jun 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eugene Pik; Eugene Pik (2024). 2023 GPS Anomalies, NOTAMs, and Aircraft Traffic [Dataset]. http://doi.org/10.5281/zenodo.11411992
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2024
    Dataset provided by
    Zenodo
    Authors
    Eugene Pik; Eugene Pik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset "2023 GPS Anomalies, NOTAMs, and Aircraft Traffic" was collected and generated for the paper "Detecting GPS Anomalies in Aviation Using ADS-B: Correlating Coordinate Gaps and GPS Deviations with NOTAM Warnings."

    This dataset provides a comprehensive collection of geospatial and temporal data necessary for analyzing potential GPS anomalies in aviation. The data sources include NOTAMs received from the FAA, and GPS information calculated and extracted from the OpenSky Trino ADS-B database. The dataset contains files such as FAA_and_ICAO_locations, Flights_per_Hour_per_Grid, GPS_Jumps_from_Routes (possible spoofing), GPS_Missing_Coordinates (possible jamming), and two sets of NOTAMs (ICAO and USA). Each file includes detailed columns that capture essential attributes and metrics, enabling thorough examination and correlation of GPS anomalies with NOTAM warnings.

    The FAA_and_ICAO_locations files include 21,383 records with identifiers, coordinates, and detailed facility information. This dataset serves as a reference for analyzing the geographical distribution of aviation facilities. The Flights_per_Hour_per_Grid file, with 74,219,036 records, provides hourly flight movement counts within specified grids, offering insights into air traffic patterns and potential disruptions. The GPS_Jumps_from_Routes data, comprising 5,878,276 records, documents deviations in flight paths, capturing metrics such as distances, speeds, and timestamps. This data is crucial for identifying potential GPS spoofing incidents by analyzing unusual jumps between consecutive data points.

    The GPS_Missing_Coordinates file, with 53,227 records, highlights periods of missing GPS signals, indicating possible GPS jamming events. This file includes start and end times, distances between known coordinates, and Navigation Integrity Category (NIC) values to assess data quality during null periods. The NOTAM_ICAO_GPS and NOTAM_USA files, with 30,161 and 234,206 records respectively, provide detailed information on NOTAM areas, including geographic extents, active periods, and categories. This allows for a comprehensive analysis of the spatial and temporal correlation between NOTAM warnings and GPS anomalies, facilitating a better understanding of the impact of GPS disruptions on aviation safety and operations.

    Summary Table

    CategoryFile NamesTotal RecordsColumns
    FAA and ICAO Locations

    FAA_and_ICAO_locations.csv.zip

    FAA_and_ICAO_locations.dpkg.zip

    21,383WKT, id, fid, Location_ID, ICAO_ID, IATA_ID, FAA_Location_Code, Facility_Type, Facility_Name, FAA_New_Location_Code, Coordinates, lat, lon, Region, Country_Code, Country, State_Id, State_Name, City, Location, Effective_Date, Site_Id, ADO, ARTCC_Id, ARTCC_Computer_ID, ARTCC_Name, Tie_In_FSS_Id, Tie_In_FSS_Name, NOTAM_Facility_Id, NOTAM_Service
    Flights per Hour per Grid

    Flights_per_Hour_per_Grid-WKT.csv.zip

    Flights_per_Hour_per_Grid-WKT.dpkg.zip

    74,219,036grid_id, hour, movement_count, geometry

    GPS Jumps from Routes

    (possible spoofing)

    GPS_Jumps_from_Routes-2023.csv.zip

    GPS_Jumps_from_Routes-2023.dpkg.zip

    5,878,276WKT, id, fid, icao24, callsign, time_before_spoofing, time_of_spoofing, distance, time_difference, speed_m_s, time_start, time_end

    GPS Missing Coordinates

    (possible jamming)

    GPS_Missing_Coordinates-2023.csv.zip

    GPS_Missing_Coordinates-2023.dpkg.zip

    53,227WKT, id, icao24, callsign, null_start_time, null_end_time, time_of_previous_not_null_coords, time_of_next_not_null_coords, between_coords_distance_m, null_duration_seconds, between_coords_duration_seconds, avg_nic, min_nic, max_nic, start_time, end_time, start_y, end_x, end_y, start_x
    NOTAM ICAO GPS

    NOTAM_ICAO_GPS-2023.csv.zip

    NOTAM_ICAO_GPS-2023.dpkg.zip

    30,161WKT, id, fid, notam_id, category_name, coordinates_center, radius_nm, radius_mod_nm, notam_number, accountability, location_id, icao_id, domestic_text, icao_text, type, category_id, time_start, time_end
    NOTAM USA

    NOTAM_USA-2023.csv.zip

    NOTAM_USA-2023.dpkg.zip

    234,206WKT, id, fid, notam_id, category_name, is_circle, coordinates_polygon, coordinates_center, radius_nm, faa_location_code, is_faa_location, location_id, is_restricted_area, restricted_area_id, restricted_area_code, category_id, message, notam_number, notam_accountability, moa, type, time_start, time_end

    Details

    Note that all the files are zipped as CSV. The GPKG version is also available where geographical information is present.

    1. FAA_and_ICAO_locations.csv.zip and FAA_and_ICAO_locations.dpkg.zip

    • Total Records: 21,383
    • Columns:
      • WKT: Well-Known Text representation of a point in the CSV file, or a geometry field in the DPKG file.
      • id: Unique identifier for each record.
      • fid: Feature identifier.
      • Location_ID: Identifier for the location.
      • ICAO_ID: ICAO (International Civil Aviation Organization) identifier.
      • IATA_ID: IATA (International Air Transport Association) identifier.
      • FAA_Location_Code: FAA location code.
      • Facility_Type: Type of facility (e.g., airport, heliport).
      • Facility_Name: Name of the facility.
      • FAA_New_Location_Code: New location code by FAA.
      • Coordinates: Coordinates of the location.
      • lat: Latitude of the location.
      • lon: Longitude of the location.
      • Region: Geographical region of the location.
      • Country_Code: Country code of the location.
      • Country: Country name of the location.
      • State_Id: State identifier.
      • State_Name: Name of the state.
      • City: City name.
      • Location: General location information.
      • Effective_Date: Effective date of the record.
      • Site_Id: Site identifier.
      • ADO: Airport District Office.
      • ARTCC_Id: ARTCC (Air Route Traffic Control Center) identifier.
      • ARTCC_Computer_ID: ARTCC computer identifier.
      • ARTCC_Name: Name of the ARTCC.
      • Tie_In_FSS_Id: Tie-in Flight Service Station identifier.
      • Tie_In_FSS_Name: Name of the Tie-in Flight Service Station.
      • NOTAM_Facility_Id: NOTAM (Notice to Airmen) facility identifier.
      • NOTAM_Service: Indicates if NOTAM service is available (Y/N).

    2. Flights_per_Hour_per_Grid-WKT.csv.zip and Flights_per_Hour_per_Grid-WKT.dpkg.zip

    • Total Records: 74,219,036
    • Columns:
      • grid_id: Identifier for the grid.
      • hour: Timestamp for the hour.
      • movement_count: Number of flights in each 0.5x0.5 degree grid during each hour of year 2023.
      • geometry: Well-Known Text representation of a polygon in the CSV file, or a geometry field in the DPKG file.

    3. GPS_Jumps_from_Routes-2023.csv.zip and GPS_Jumps_from_Routes-2023.dpkg.zip

    • Total Records: 5,878,276
    • Columns:
      • WKT: Well-Known Text representation of a linestring in the CSV file, or a geometry field in the DPKG file.
      • id: Unique identifier for each record.
      • fid: Feature identifier.
      • icao24: ICAO 24-bit aircraft address.
      • callsign: Callsign of the aircraft.
      • time_before_spoofing: Timestamp before the spoofing event.
      • time_of_spoofing: Timestamp of the spoofing event.
      • distance: Distance of the jump in meters.
      • time_difference: Time difference between two coordinates in seconds.
      • speed_m_s: Speed in meters per second.
      • time_start: Start time of the record.
      • time_end: End time of the record.

    4. GPS_Missing_Coordinates-2023.csv.zip and GPS_Missing_Coordinates-2023.dpkg.zip

    • Total Records: 53,227
    • Columns:
      • WKT: Well-Known Text representation of a linestring in the CSV file, or a geometry field in the DPKG file.
      • id: Unique identifier for each record.
      • icao24: ICAO 24-bit aircraft address.
      • callsign: Callsign of the aircraft.
      • null_start_time: Start time of missing GPS

  8. Opendata AIG Brazil

    • kaggle.com
    zip
    Updated May 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nosbielcs (2018). Opendata AIG Brazil [Dataset]. https://www.kaggle.com/nosbielcs/opendataaigbrazil
    Explore at:
    zip(326278 bytes)Available download formats
    Dataset updated
    May 6, 2018
    Authors
    Nosbielcs
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Opendata AIG Brazil | Datasets for Analysis the aeronautical accidents and incidents in Brazil

    Sobre o Projeto | About the project

    https://raw.githubusercontent.com/nosbielcs/opendata_aig_brazil/master/reference/data_model.png" alt="Modelo de Dados">

    Download dos Dados | Dataset downloads

    1. [OCORRÊNCIAS AERONÁUTICAS] | AERONAULTICAL OCURRENCES(https://raw.githubusercontent.com/nosbielcs/opendata_aig_brazil/master/data/oco.csv "Tabela de Ocorrências - oco.csv")
    2. [AERONAVES ENVOLVIDAS] | AIRCRAFTS IN OCURRENCES(https://raw.githubusercontent.com/nosbielcs/opendata_aig_brazil/master/data/anv.csv "Tabela de Aeronaves Envolvidas - anv.csv")
    3. [FATORES CONTRIBUINTES] | CONTRIBUTING FACTORS(https://raw.githubusercontent.com/nosbielcs/opendata_aig_brazil/master/data/ftc.csv "Tabela de Fatores Contribuintes - ftc.csv")
    4. [RECOMENDAÇÕES DE SEGURANÇA] | SAFETY RECOMMENDATIONS(https://raw.githubusercontent.com/nosbielcs/opendata_aig_brazil/master/data/rec.csv "Tabela de Recomendações de Segurança - rec.csv")

    Notas Técnicas | Technical notes

    1. Os textos dentro das colunas estão denotados por aspas duplas ("").
    2. As colunas das tabelas estão separadas por til (~).
    3. As tabelas contém cabeçalhos que identificam suas colunas.
    4. Em cada tabela existe uma coluna contendo a informação sobre a data de extração dos dados.

    Outras Informações "For Dummies" | Other informations for dummies

    1. Os relatórios finais podem ser consultados no site do CENIPA - Relatórios.
    2. As recomendações de segurança podem ser consultadas no site do CENIPA - Recomendações.
    3. Artigos científicos sobre o tema podem ser encontrados / publicados na Revista Conexão SIPAER.

    Outros Recursos | Other resourses

    Outras bases de dados para consultas: | Other datasets for use

    1. [NTSB] | NTSB(http://www.ntsb.gov/_layouts/ntsb.aviation/index.aspx "Base de dados do NTSB")
    2. [BEA] | BEA(https://www.bea.aero/no_cache/les-enquetes/les-evenements-notifies/ "Base de dados do BEA")
    3. [RISCO DA FAUNA] | BIRDSTRICK RISK(http://www.cenipa.aer.mil.br/cenipa/sigra/pesquisa_dadosExt "Reportes de eventos de Risco da Fauna no Brasil")
    4. [RAIO LASER] | LASER RAY RISK(http://www.cenipa.aer.mil.br/cenipa/raio_laser/pesquisa "Reportes de eventos de Raio Laser na Aviação Brasileira")
    5. [RISCO BALOEIRO] | BALOON RISK(http://www.cenipa.aer.mil.br/cenipa/baloeiro/pesquisa "Reportes de eventos com soltura de Balões que afetam a Aviação Brasileira")
    6. [AERÓDROMOS BRASILEIROS] | BRAZILIAN AIRPORTS(http://dados.gov.br/dataset/airport21jul16 "Listagem de Aeródromos brasileiros publicada pelo DECEA")
    7. AEROVIAS BRASILEIRAS

    Dicas para melhor aproveitamento dos recursos | Tips for better improvement and use the datasets

    1. Antes de fazer o download dos dados, leia com calma todo o texto desta página. Este recurso irá guiá-lo(a) para um adequado entendimento sobre os relacionamentos entre os conjuntos de dados disponíveis (ocorrencia, aeronave envolvida, fator_contribuinte e recomendações de segurança).
    2. Para aprofundar-se no tema, visite o site do CENIPA e confira as LEGISLAÇÕES que norteiam a investigação e prevenção de acidentes aeronáuticos no Brasil.
    3. Conheça o Manual de Investigação do SIPAER. Nos anexos deste documento você encontrará uma tabela de domínios (taxonomia) para algumas das variáveis disponíveis nos conjuntos de dados.
    4. Devido ao dinamismo dos trabalhos de investigação e preocupação do CENIPA com a agilidade na disponibilização dos dados, os conjuntos de dados estarão sujeitos a modificações sempre que forem atualizados. Portanto, sempre que possível, utilize a "data de extração" dos conjuntos de dados para justificar/referenciar os seus estudos e análises.
    5. Saiba como trabalhar com dados no formato CSV. Clique aqui para aprender

    Dúvidas | Help

    Se persistirem dúvidas, por gentileza me enviem uma Issue (relatar problema). Clique aqui para relatar um Problema

  9. ICAO Aircraft Engine Emissions

    • kaggle.com
    Updated Nov 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Eltom (2022). ICAO Aircraft Engine Emissions [Dataset]. https://www.kaggle.com/datasets/ahmedeltom/icao-aircraft-engine-emissions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2022
    Dataset provided by
    Kaggle
    Authors
    Ahmed Eltom
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The ICAO Aircraft Engine Emissions Databank contains information on exhaust emissions of production aircraft engines, measured according to the procedures in ICAO Annex 16, Volume II, and where noted, certified by the States of Design of the engines according to their national regulations. The databank covers engine types which emissions are regulated, namely turbojet and turbofan engines with a static thrust greater than 26.7 kilonewtons. The information is provided by the engine manufacturers, who are solely responsible for its accuracy. The European Union Aviation Safety Agency (EASA) is hosting the databank on behalf of ICAO and is not responsible for the contents.

    Engine manufacturers submit their data to the primary certificating authority (CA) for approval as part of the certification process. Once the data has been approved by the primary CA, manufacturers can voluntarily submit it to EASA for inclusion in the ICAO Engine Emissions Databank. The data must be submitted in a predefined format (see Excel data template in below Downloads section). The primary CA verifies that the data submitted to the databank is in conformity with the approved data from certification. EASA then checks the data format and consistency before publishing it. The frequency of databank updates depends on the availability of new data but is aimed to be at least once a year.

    Data submittals, comments and queries regarding the ICAO Engine Emissions Databank should be sent to emissions.databank@easa.europa.eu.

    The ICAO Aircraft Engine Emission Databank (EEDB) consists of two data worksheets/files labelled "Gaseous Emissions and Smoke" and "nvPM Emissions". (ges.csv, nvpm.csv).
    A UID No. identifies the emissions information for each engine, and applies to both files for those engines with nvPM information available.

    Please note:
    "- nvPM Emissions may not be available for engines that went out of production before 01/01/2020. - nvPM Emissions for a given engine may not have been gathered and reported at the same time as gaseous and smoke emissions for the same engine. Therefore the values of common parameters in both worksheets (e.g. fuel flows) may in some cases deviate between the two sheets (e.g. due to improved understanding of engine performance)."
    (Date: Jul 2021)

    Description of ges.csv:

    HeadingDescription (if different from Heading)
    UID NoUnique Identification Number for an EEDB entry
    GSDB NoGaseous and smoke emissions database number (continuous number for each set of gaseous/smoke emissions information assigned at the time of their first publication). Newest additions have the highest number.
    ManufacturerEngine manufacturer
    Engine Identification
    Combustor DescriptionType of combustor where more than one type available on an engine
    Eng TypeEngine type. TF = turbofan, MTF = mixed turbofan
    B/P RatioBypass ratio
    Pressure RatioEngine pressure ratio
    Rated Thrust (kN)Engine maximum rated thrust, in kilonewtons
    Data StatusData status - PR: Data generated prior to regulatio
    Data SupersededData for which a revised set has been supplied (data row shaded mid-grey in case of superseded data). Revised data is applicable to the same engine and includes e.g. data corrections or results from additional engine testing.
    Superseded by UID NoUID of data which replaced the superseded data.
    Test Engine StatusTest Engine Status - NME: Data from newly manufactured engines; DTEPS: Data from dedicated test engines to production standards; Other: Data from engines other than NME or DTEPS, see remarks
    Data corr as Annex 16The emissions data has been corrected according to ICAO Annex 16, Vol 2, Part III, Appendix 3
    Current Engine StatusIndicates if the engine is out of pro
    Current Engine Status DateDate on which the engine ceased to be produced or became out of service (if applicable)
    HC EI T/O (g/kg)Hydrocarbon emission index (g/kg) at take off condition
    HC EI C/O (g/kg)Hydrocarbon emission index (g/kg) at climb out condition
    HC EI App (g/kg)Hydrocarbon emission index (g/kg) at approach condition
    HC EI Idle (g/kg)Hydrocarbon emission index (g/kg) at idle condition
    HC Number TestNumber of tests done for hydrocarbon
    HC Number EngNumber of engines tested for hydrocarbon
    HC Dp/Foo Avg (g/kN)Hydrocarbon Dp/Foo (g/kN) average
    HC Dp/Foo Sigma (g/kN)Hydrocarbon Dp/Foo (g/kN) standard deviation
    HC Dp/Foo Min (g/kN)Hydrocarbon Minimum value Dp/Foo (g/kN)
    HC Dp/Foo Max (g/kN)Hydrocarbon Maximum value Dp/Foo (g/kN)
    HC Dp/Foo Characteristic (g/kN)Hydrocarbon characteristic Dp/Foo value (g/kN)
    HC Dp/Foo Characteri...
  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Atharva Koshti (2025). Aeroplane Crash Data from 1919 to 2025 [Dataset]. https://www.kaggle.com/datasets/atharvakoshti/aeroplane-crash-data-from-1919-to-2025
Organization logo

Aeroplane Crash Data from 1919 to 2025

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 4, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Atharva Koshti
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

🛫 Airplane Crash Data (1919–2025) – Cleaned & Unified 📌 Overview This dataset is a comprehensive and manually curated collection of global aviation accidents and incidents from 1919 to 2025, sourced from five authoritative platforms. It combines historical and modern records into a single, clean, and analysis-ready .csv file — ideal for data science, machine learning, and aviation safety research.

📂 Sources Used The raw data was gathered from the following sources:

Each source had unique attributes, structures, and formats. I manually extracted, cleaned, de-duplicated, and unified the datasets to generate this high-quality final version.

🧹 Data Cleaning & Curation The dataset preparation involved:

🧭 Date standardization across multiple formats (including parsing old historical dates)

🔍 Duplicate removal from overlapping sources

🛬 Location normalization (city, country, coordinates where possible)

📉 Fatality/injury counts harmonized into consistent columns

🧑‍✈️ Flight purpose categorization (commercial, military, training, etc.)

💥 Cause/description refinement to improve textual analysis usability

🏷️ Tagging & classification based on incident severity, aircraft type, etc.

📊 Columns in cleaned_data.csv(this is combination of all databased ,ready to work on) Below is a typical structure of the dataset:

Column Name Description Date :Date of the incident Location :City/Region/Country of the crash Operator :Airline or aircraft operator Flight No :Flight number (if available) Aircraft Type :Type/model of the aircraft Registration :Aircraft registration number Fatalities :Total number of fatalities Aboard :Total number of people on board Ground Fatalities :Number of people killed on the ground (if any) Summary :Short description or probable cause Source :Original source from which the data point was collected Crash Type :Categorized tag: e.g., Mid-air collision, engine failure, pilot error, etc. Year :Extracted year (useful for trend analysis)

Note: Not all columns are present in each original file; where possible, missing data has been filled or marked appropriately.

🔍 Why This Dataset Is Unique 📅 Over a century of aviation data (1919–2025)

🔄 Merged from five reputable sources

🧼 Thorough manual cleaning and validation

📚 Useful for:

Aviation safety analysis

Time-series forecasting

Natural Language Processing (NLP) on crash summaries

Machine learning (e.g., predicting crash causes or fatalities)

📌 Suggested Use Cases ✈️ Predictive modeling of aviation risk

📉 Trend analysis in global air safety

🗺️ Geographic visualization of accident hotspots

🤖 NLP classification of crash summaries

📊 Dashboard creation in Power BI or Tableau

📁 File Included cleaned_data.csv – Final cleaned dataset with unified schema

Search
Clear search
Close search
Google apps
Main menu