Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
🛫 Airplane Crash Data (1919–2025) – Cleaned & Unified 📌 Overview This dataset is a comprehensive and manually curated collection of global aviation accidents and incidents from 1919 to 2025, sourced from five authoritative platforms. It combines historical and modern records into a single, clean, and analysis-ready .csv file — ideal for data science, machine learning, and aviation safety research.
📂 Sources Used The raw data was gathered from the following sources:
Each source had unique attributes, structures, and formats. I manually extracted, cleaned, de-duplicated, and unified the datasets to generate this high-quality final version.
🧹 Data Cleaning & Curation The dataset preparation involved:
🧭 Date standardization across multiple formats (including parsing old historical dates)
🔍 Duplicate removal from overlapping sources
🛬 Location normalization (city, country, coordinates where possible)
📉 Fatality/injury counts harmonized into consistent columns
🧑✈️ Flight purpose categorization (commercial, military, training, etc.)
💥 Cause/description refinement to improve textual analysis usability
🏷️ Tagging & classification based on incident severity, aircraft type, etc.
📊 Columns in cleaned_data.csv(this is combination of all databased ,ready to work on) Below is a typical structure of the dataset:
Column Name Description Date :Date of the incident Location :City/Region/Country of the crash Operator :Airline or aircraft operator Flight No :Flight number (if available) Aircraft Type :Type/model of the aircraft Registration :Aircraft registration number Fatalities :Total number of fatalities Aboard :Total number of people on board Ground Fatalities :Number of people killed on the ground (if any) Summary :Short description or probable cause Source :Original source from which the data point was collected Crash Type :Categorized tag: e.g., Mid-air collision, engine failure, pilot error, etc. Year :Extracted year (useful for trend analysis)
Note: Not all columns are present in each original file; where possible, missing data has been filled or marked appropriately.
🔍 Why This Dataset Is Unique 📅 Over a century of aviation data (1919–2025)
🔄 Merged from five reputable sources
🧼 Thorough manual cleaning and validation
📚 Useful for:
Aviation safety analysis
Time-series forecasting
Natural Language Processing (NLP) on crash summaries
Machine learning (e.g., predicting crash causes or fatalities)
📌 Suggested Use Cases ✈️ Predictive modeling of aviation risk
📉 Trend analysis in global air safety
🗺️ Geographic visualization of accident hotspots
🤖 NLP classification of crash summaries
📊 Dashboard creation in Power BI or Tableau
📁 File Included cleaned_data.csv – Final cleaned dataset with unified schema
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Transport and communication are vital domains within the field of analytics, particularly in addressing safety and environmental concerns linked to the rapid growth of urban areas and increasing air traffic. Among the many risks aviation faces, bird strikes—collisions between aircraft and birds or other wildlife—pose a significant threat. These strikes can cause serious damage to aircraft, particularly jet engines, and have been responsible for some fatal accidents. Bird strikes are most likely to occur during critical flight phases such as take-off, climb, approach, and landing, when aircraft are at lower altitudes and bird activity is higher.
The dataset provided by the FAA, covering incidents from 2000 to 2011, offers a comprehensive overview of bird strikes in the U.S. It includes detailed visualizations and analyses across several key areas:
This dataset offers valuable insights into bird strike patterns, focusing on factors such as aircraft type, location, flight phase, and the specific species involved. By analyzing these variables, it helps identify risk factors and trends, supporting the development of strategies to reduce the frequency and impact of bird strikes, ultimately enhancing aviation safety and risk mitigation.
Facebook
TwitterBy US Open Data Portal, data.gov [source]
This U.S. Government Works Aviation Safety Reports Dataset for Text Mining is part of the SIAM 2007 Text Mining Competition dataset which has been used to create algorithms to classify documents according to the types of problems described. The documents in this dataset consist of reports on incidents that occurred during certain flights and are collected from human-generated reports as part of the Aviation Safety Reporting System (ASRS). The files for this competition come in raw text format, with each row representing a single document and its associated problem type label.
This dataset provides invaluable insights into aviation safety incidents and is an excellent resource for researchers interested in developing text mining techniques for categorizing documents by their contents. Analyzing these documents can help identify potential safety issues, both within individual aircrafts’ operations and more broadly online, driving domestic flying safety forward in an era when ever increasing numbers of people are travelling by air
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains aviation safety reports which have been labelled according to the type of problem that occurred during a certain flight. It is a great resource for developing text mining algorithms for document classification.
- Build an AI-powered Machine Learning classifier to identify problematic aviation incidents more quickly and accurately.
- Predict the risk of a particular flight, taking into consideration the type of incident that has occurred before on a similar flight.
- Construct an interactive searchable interface to allow users to better analyze and visualize aviation safety reports in order to uncover trends and suggest ways for improvement across all levels of relevant stakeholders within the sector, such as regulators, airlines, aircraft operators or pilots
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: testtruth-csv-gz-3.csv | Column name | Description | |:--------------|:------------------------------------| | -1 | Document Number (String) | | -1.1 | Aircraft Autopilot Problem (String) | | -1.2 | Auxiliary Power Problem (String) | | -1.3 | Avionics Problem (String) | | -1.4 | Cabin Pressure Problem (String) | | -1.5 | Communications Problem (String) | | -1.6 | Electrical System Problem (String) | | -1.7 | Engine Problem (String) | | -1.8 | Fire/Smoke Problem (String) | | -1.9 | Fuel System Problem (String) | | -1.10 | Ground Service Problem (String) | | -1.11 | Hydraulic System Problem (String) | | -1.12 | Ice/Frost Problem (String) | | -1.13 | Landing Gear Problem (String) | | -1.14 | Maintenance Problem (String) | | -1.15 | Navigation Problem (String) | | -1.16 | Oxygen System Problem (String) | | -1.17 | Structural Problem (String) | | -1.18 | Other Problem (String) |
File: traincategorymatrix-csv-gz-5.csv | Column name | Description | |:--------------|:------------------------------------| | -1 | Document Number (String) | | -1.1 | Aircraft Autopilot Problem (String) | | -1.2 | Auxiliary Power Problem (String) | | -1.3 | Avionics Problem (String) | | -1.4 | Cabin Pressure Problem (String) | | -1.5 | Communications Problem (String) | | -1.6 | Electrical System Problem (String) | | -1.7 | Engine Problem (String) | | -1.8 | Fire/Smoke Problem (String) | | -1.9 | Fuel System Problem (String) | | -1.10 | Ground Service Problem (String) | | -1.11 | Hydraulic System Problem (String) | | -1.12 | Ice/Frost Problem (String) | | -1.13 | Landing Gear Problem (String) | | -1.14 | Maintenance Problem (String) | | -1.15 | Navigation Problem (String) | | -1.16 | Oxyg...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides detailed records of U.S. civil aviation accidents and incidents investigated by the National Transportation Safety Board (NTSB) from 1982 to the present. Converted from official NTSB public records, it includes structured CSV tables covering accident circumstances, aircraft and flight details, personnel information, outcomes, and probable cause determinations.
Contents:
Multiple CSV files (one per data table) such as: accidents, aircraft, flight crew, events, injuries, and more.
Each table is linked by a unique event identifier (ev_id), enabling relational analysis across files.
Key fields: event date and location, aircraft type, operator, weather conditions, phase of operation, damage extent, injury counts, and full “Probable Cause” narratives.
Use Cases:
Aviation safety analysis and risk modeling
Root cause and human factors studies
Training and benchmarking machine learning models on real-world accident data
Data visualization and exploratory research
Data Source and Provenance:
Source: U.S. National Transportation Safety Board (NTSB)
Original dataset (“avall.zip”/“avall.mdb”) publicly distributed by the NTSB and converted to CSV for accessibility
Time Coverage: January 1982 – Present (updated annually by the NTSB)
License: Public US government data – most suitable as CC0: Public Domain (verify for your purposes)
Notes:
Some records may have missing or redacted fields due to ongoing investigations or privacy protection.
Column-level documentation is available in the data explorer for each CSV file.
For full documentation of fields and code definitions, refer to the included data dictionaries or the NTSB website.
Typical Applications:
Accident rate trends by aircraft type or operator
Statistical or machine learning classification of accident causes
Visualization of accident distribution by geography, time, or weather conditions
Please cite the NTSB as the original data source in all derived works.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset "2023 GPS Anomalies, NOTAMs, and Aircraft Traffic" was collected and generated for the paper "Detecting GPS Anomalies in Aviation Using ADS-B: Correlating Coordinate Gaps and GPS Deviations with NOTAM Warnings."
This dataset provides a collection of geospatial and temporal data necessary for analyzing potential GPS anomalies in aviation. The data sources include NOTAMs received from the FAA, and the aircraft traffic and GPS information calculated and extracted from the OpenSky Trino ADS-B database.
The FAA_and_ICAO_locations file includes 21,382 records with identifiers, coordinates, and detailed facility information. This dataset serves as a reference for analyzing the geographical distribution of aviation facilities. The Flights_per_Hour_per_Grid file, with 74,219,036 records, provides hourly flight movement counts within specified grids, offering insights into air traffic patterns and potential disruptions. The GPS_Jumps_from_Routes file, comprising 5,878,275 records, documents deviations in flight paths, capturing metrics such as distances, speeds, and timestamps. This data is crucial for identifying potential GPS spoofing incidents by analyzing unusual jumps between consecutive data points.
The GPS_Missing_Coordinates file, with 53,232 records, highlights periods of missing GPS signals, indicating possible GPS jamming events. This file includes start and end times, distances between known coordinates, and Navigation Integrity Category (NIC) values to assess data quality during null periods. The NOTAM_ICAO_GPS and NOTAM_USA files, with 30,160 and 234,205 records respectively, provide detailed information on NOTAM areas, including geographic areas, active periods, and categories. This allows for an analysis of the spatial and temporal correlation between NOTAM warnings and GPS anomalies, facilitating a better understanding of the impact of GPS disruptions on aviation safety and operations.
|
Category |
File Names |
Total Records |
Columns |
|
FAA and ICAO Locations |
FAA_and_ICAO_locations.csv FAA_and_ICAO_locations.dpkg |
21,382 |
WKT, id, fid, Location_ID, ICAO_ID, IATA_ID, FAA_Location_Code, Facility_Type, Facility_Name, FAA_New_Location_Code, Coordinates, lat, lon, Region, Country_Code, Country, State_Id, State_Name, City, Location, Effective_Date, Site_Id, ADO, ARTCC_Id, ARTCC_Computer_ID, ARTCC_Name, Tie_In_FSS_Id, Tie_In_FSS_Name, NOTAM_Facility_Id, NOTAM_Service |
|
Flights per Hour per Grid |
Flights_per_Hour_per_Grid-2023.csv Flights_per_Hour_per_Grid-2023.dpkg |
74,219,036 |
grid_id, hour, movement_count, geometry |
|
GPS Jumps from Routes (possible spoofing) |
GPS_Jumps_from_Routes-2023.csv GPS_Jumps_from_Routes-2023.dpkg |
5,878,275 |
WKT, id, fid, icao24, callsign, time_before_spoofing, time_of_spoofing, distance, time_difference, speed_m_s, time_start, time_end |
|
GPS Missing Coordinates (possible jamming) |
GPS_Missing_Coordinates-2023.csv GPS_Missing_Coordinates-2023.dpkg |
53,232 |
WKT, id, icao24, callsign, null_start_time, null_end_time, time_of_previous_not_null_coords, time_of_next_not_null_coords, between_coords_distance_m, null_duration_seconds, between_coords_duration_seconds, avg_nic, min_nic, max_nic, start_time, end_time, start_y, end_x, end_y, start_x |
|
NOTAM ICAO GPS |
NOTAM_ICAO_GPS-2023.csv NOTAM_ICAO_GPS-2023.dpkg |
30,160 |
WKT, id, fid, notam_id, category_name, coordinates_center, radius_nm, radius_mod_nm, notam_number, accountability, location_id, icao_id, domestic_text, icao_text, type, category_id, time_start, time_end |
|
NOTAM USA |
NOTAM_USA-2023.csv NOTAM_USA-2023.dpkg |
234,205 |
WKT, id, fid, notam_id, category_name, is_circle, coordinates_polygon, coordinates_center, radius_nm, faa_location_code, is_faa_location, location_id, is_restricted_area, restricted_area_id, restricted_area_code, category_id, message, notam_number, notam_accountability, moa, type, time_start, time_end |
|
|
|
|
|
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description:
Explore a comprehensive dataset detailing the history of airplane crashes and fatalities worldwide from 1908 to 2023. This dataset encapsulates invaluable information for researchers, aviation enthusiasts, and safety experts interested in understanding the dynamics, trends, and patterns of aviation incidents over more than a century.
Content:
This dataset contains meticulously curated information on various aspects of airplane crashes, including but not limited to:
Insights and Analysis:
Through this dataset, delve into the trends and patterns that have shaped aviation safety and industry regulations over time. Analyze factors such as geographical distribution, aircraft models, airlines involved, weather conditions, and potential causes of crashes to derive valuable insights and contribute to enhanced safety protocols and risk management strategies.
Applications:
Safety Improvement: Identify recurring factors contributing to crashes and propose preventive measures. Regulatory Enhancements: Inform policymakers and regulatory bodies to enact measures that ensure safer skies. Research and Education: Enable academic research and educational resources to study aviation safety comprehensively.
Acknowledgments:
PlaneCrashInfo (https://www.planecrashinfo.com): A vital resource in the documentation and dissemination of aviation incident information, contributing significantly to the wealth of knowledge available for this dataset.
This dataset owes its foundation to the invaluable work of CGurkan (https://www.kaggle.com/datasets/cgurkan/airplane-crash-data-since-1908) for their initial compilation of airplane crash data since 1908 to 2019. We extend our sincere gratitude for their efforts in creating the groundwork for this comprehensive dataset.
Furthermore, significant updates and enhancements have been made to the original dataset, incorporating additional information, refining existing records, and ensuring the dataset's accuracy and relevance over time. We're deeply appreciative of the collaborative spirit that underlies the improvement and evolution of this dataset.
We also acknowledge the contributions of various entities, including aviation safety organizations, researchers, data providers, and community members, whose collective dedication has enriched and expanded this dataset, making it a more robust resource for studying aviation incidents and promoting safety measures.
Your ongoing support and contributions are instrumental in advancing the understanding and improvement of aviation safety worldwide.
Usage Policy:
This dataset is provided for educational and analytical purposes. Users are encouraged to attribute the data source appropriately and contribute responsibly to the aviation safety domain.
Feedback and Collaboration:
We welcome feedback, contributions, and collaborations from the community to enhance the dataset's accuracy, completeness, and utility. Together, let's strive to make aviation safer for everyone.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset "2023 GPS Anomalies, NOTAMs, and Aircraft Traffic" was collected and generated for the paper "Detecting GPS Anomalies in Aviation Using ADS-B: Correlating Coordinate Gaps and GPS Deviations with NOTAM Warnings."
This dataset provides a comprehensive collection of geospatial and temporal data necessary for analyzing potential GPS anomalies in aviation. The data sources include NOTAMs received from the FAA, and GPS information calculated and extracted from the OpenSky Trino ADS-B database. The dataset contains files such as FAA_and_ICAO_locations, Flights_per_Hour_per_Grid, GPS_Jumps_from_Routes (possible spoofing), GPS_Missing_Coordinates (possible jamming), and two sets of NOTAMs (ICAO and USA). Each file includes detailed columns that capture essential attributes and metrics, enabling thorough examination and correlation of GPS anomalies with NOTAM warnings.
The FAA_and_ICAO_locations files include 21,383 records with identifiers, coordinates, and detailed facility information. This dataset serves as a reference for analyzing the geographical distribution of aviation facilities. The Flights_per_Hour_per_Grid file, with 74,219,036 records, provides hourly flight movement counts within specified grids, offering insights into air traffic patterns and potential disruptions. The GPS_Jumps_from_Routes data, comprising 5,878,276 records, documents deviations in flight paths, capturing metrics such as distances, speeds, and timestamps. This data is crucial for identifying potential GPS spoofing incidents by analyzing unusual jumps between consecutive data points.
The GPS_Missing_Coordinates file, with 53,227 records, highlights periods of missing GPS signals, indicating possible GPS jamming events. This file includes start and end times, distances between known coordinates, and Navigation Integrity Category (NIC) values to assess data quality during null periods. The NOTAM_ICAO_GPS and NOTAM_USA files, with 30,161 and 234,206 records respectively, provide detailed information on NOTAM areas, including geographic extents, active periods, and categories. This allows for a comprehensive analysis of the spatial and temporal correlation between NOTAM warnings and GPS anomalies, facilitating a better understanding of the impact of GPS disruptions on aviation safety and operations.
| Category | File Names | Total Records | Columns |
|---|---|---|---|
| FAA and ICAO Locations |
FAA_and_ICAO_locations.csv.zip FAA_and_ICAO_locations.dpkg.zip | 21,383 | WKT, id, fid, Location_ID, ICAO_ID, IATA_ID, FAA_Location_Code, Facility_Type, Facility_Name, FAA_New_Location_Code, Coordinates, lat, lon, Region, Country_Code, Country, State_Id, State_Name, City, Location, Effective_Date, Site_Id, ADO, ARTCC_Id, ARTCC_Computer_ID, ARTCC_Name, Tie_In_FSS_Id, Tie_In_FSS_Name, NOTAM_Facility_Id, NOTAM_Service |
| Flights per Hour per Grid |
Flights_per_Hour_per_Grid-WKT.csv.zip Flights_per_Hour_per_Grid-WKT.dpkg.zip | 74,219,036 | grid_id, hour, movement_count, geometry |
|
GPS Jumps from Routes (possible spoofing) |
GPS_Jumps_from_Routes-2023.csv.zip GPS_Jumps_from_Routes-2023.dpkg.zip | 5,878,276 | WKT, id, fid, icao24, callsign, time_before_spoofing, time_of_spoofing, distance, time_difference, speed_m_s, time_start, time_end |
|
GPS Missing Coordinates (possible jamming) |
GPS_Missing_Coordinates-2023.csv.zip GPS_Missing_Coordinates-2023.dpkg.zip | 53,227 | WKT, id, icao24, callsign, null_start_time, null_end_time, time_of_previous_not_null_coords, time_of_next_not_null_coords, between_coords_distance_m, null_duration_seconds, between_coords_duration_seconds, avg_nic, min_nic, max_nic, start_time, end_time, start_y, end_x, end_y, start_x |
| NOTAM ICAO GPS |
NOTAM_ICAO_GPS-2023.csv.zip NOTAM_ICAO_GPS-2023.dpkg.zip | 30,161 | WKT, id, fid, notam_id, category_name, coordinates_center, radius_nm, radius_mod_nm, notam_number, accountability, location_id, icao_id, domestic_text, icao_text, type, category_id, time_start, time_end |
| NOTAM USA |
NOTAM_USA-2023.csv.zip NOTAM_USA-2023.dpkg.zip | 234,206 | WKT, id, fid, notam_id, category_name, is_circle, coordinates_polygon, coordinates_center, radius_nm, faa_location_code, is_faa_location, location_id, is_restricted_area, restricted_area_id, restricted_area_code, category_id, message, notam_number, notam_accountability, moa, type, time_start, time_end |
Note that all the files are zipped as CSV. The GPKG version is also available where geographical information is present.
1. FAA_and_ICAO_locations.csv.zip and FAA_and_ICAO_locations.dpkg.zip
2. Flights_per_Hour_per_Grid-WKT.csv.zip and Flights_per_Hour_per_Grid-WKT.dpkg.zip
3. GPS_Jumps_from_Routes-2023.csv.zip and GPS_Jumps_from_Routes-2023.dpkg.zip
4. GPS_Missing_Coordinates-2023.csv.zip and GPS_Missing_Coordinates-2023.dpkg.zip
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
https://raw.githubusercontent.com/nosbielcs/opendata_aig_brazil/master/reference/data_model.png" alt="Modelo de Dados">
Download dos Dados | Dataset downloads
Notas Técnicas | Technical notes
Outras bases de dados para consultas: | Other datasets for use
Dicas para melhor aproveitamento dos recursos | Tips for better improvement and use the datasets
Se persistirem dúvidas, por gentileza me enviem uma Issue (relatar problema). Clique aqui para relatar um Problema
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The ICAO Aircraft Engine Emissions Databank contains information on exhaust emissions of production aircraft engines, measured according to the procedures in ICAO Annex 16, Volume II, and where noted, certified by the States of Design of the engines according to their national regulations. The databank covers engine types which emissions are regulated, namely turbojet and turbofan engines with a static thrust greater than 26.7 kilonewtons. The information is provided by the engine manufacturers, who are solely responsible for its accuracy. The European Union Aviation Safety Agency (EASA) is hosting the databank on behalf of ICAO and is not responsible for the contents.
Engine manufacturers submit their data to the primary certificating authority (CA) for approval as part of the certification process. Once the data has been approved by the primary CA, manufacturers can voluntarily submit it to EASA for inclusion in the ICAO Engine Emissions Databank. The data must be submitted in a predefined format (see Excel data template in below Downloads section). The primary CA verifies that the data submitted to the databank is in conformity with the approved data from certification. EASA then checks the data format and consistency before publishing it. The frequency of databank updates depends on the availability of new data but is aimed to be at least once a year.
Data submittals, comments and queries regarding the ICAO Engine Emissions Databank should be sent to emissions.databank@easa.europa.eu.
The ICAO Aircraft Engine Emission Databank (EEDB) consists of two data worksheets/files labelled "Gaseous Emissions and Smoke" and "nvPM Emissions". (ges.csv, nvpm.csv).
A UID No. identifies the emissions information for each engine, and applies to both files for those engines with nvPM information available.
Please note:
"- nvPM Emissions may not be available for engines that went out of production before 01/01/2020.
- nvPM Emissions for a given engine may not have been gathered and reported at the same time as gaseous and smoke emissions for the same engine.
Therefore the values of common parameters in both worksheets (e.g. fuel flows) may in some cases deviate between the two sheets (e.g. due to improved understanding of engine performance)."
(Date: Jul 2021)
Description of ges.csv:
| Heading | Description (if different from Heading) |
|---|---|
| UID No | Unique Identification Number for an EEDB entry |
| GSDB No | Gaseous and smoke emissions database number (continuous number for each set of gaseous/smoke emissions information assigned at the time of their first publication). Newest additions have the highest number. |
| Manufacturer | Engine manufacturer |
| Engine Identification | |
| Combustor Description | Type of combustor where more than one type available on an engine |
| Eng Type | Engine type. TF = turbofan, MTF = mixed turbofan |
| B/P Ratio | Bypass ratio |
| Pressure Ratio | Engine pressure ratio |
| Rated Thrust (kN) | Engine maximum rated thrust, in kilonewtons |
| Data Status | Data status - PR: Data generated prior to regulatio |
| Data Superseded | Data for which a revised set has been supplied (data row shaded mid-grey in case of superseded data). Revised data is applicable to the same engine and includes e.g. data corrections or results from additional engine testing. |
| Superseded by UID No | UID of data which replaced the superseded data. |
| Test Engine Status | Test Engine Status - NME: Data from newly manufactured engines; DTEPS: Data from dedicated test engines to production standards; Other: Data from engines other than NME or DTEPS, see remarks |
| Data corr as Annex 16 | The emissions data has been corrected according to ICAO Annex 16, Vol 2, Part III, Appendix 3 |
| Current Engine Status | Indicates if the engine is out of pro |
| Current Engine Status Date | Date on which the engine ceased to be produced or became out of service (if applicable) |
| HC EI T/O (g/kg) | Hydrocarbon emission index (g/kg) at take off condition |
| HC EI C/O (g/kg) | Hydrocarbon emission index (g/kg) at climb out condition |
| HC EI App (g/kg) | Hydrocarbon emission index (g/kg) at approach condition |
| HC EI Idle (g/kg) | Hydrocarbon emission index (g/kg) at idle condition |
| HC Number Test | Number of tests done for hydrocarbon |
| HC Number Eng | Number of engines tested for hydrocarbon |
| HC Dp/Foo Avg (g/kN) | Hydrocarbon Dp/Foo (g/kN) average |
| HC Dp/Foo Sigma (g/kN) | Hydrocarbon Dp/Foo (g/kN) standard deviation |
| HC Dp/Foo Min (g/kN) | Hydrocarbon Minimum value Dp/Foo (g/kN) |
| HC Dp/Foo Max (g/kN) | Hydrocarbon Maximum value Dp/Foo (g/kN) |
| HC Dp/Foo Characteristic (g/kN) | Hydrocarbon characteristic Dp/Foo value (g/kN) |
| HC Dp/Foo Characteri... |
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
🛫 Airplane Crash Data (1919–2025) – Cleaned & Unified 📌 Overview This dataset is a comprehensive and manually curated collection of global aviation accidents and incidents from 1919 to 2025, sourced from five authoritative platforms. It combines historical and modern records into a single, clean, and analysis-ready .csv file — ideal for data science, machine learning, and aviation safety research.
📂 Sources Used The raw data was gathered from the following sources:
Each source had unique attributes, structures, and formats. I manually extracted, cleaned, de-duplicated, and unified the datasets to generate this high-quality final version.
🧹 Data Cleaning & Curation The dataset preparation involved:
🧭 Date standardization across multiple formats (including parsing old historical dates)
🔍 Duplicate removal from overlapping sources
🛬 Location normalization (city, country, coordinates where possible)
📉 Fatality/injury counts harmonized into consistent columns
🧑✈️ Flight purpose categorization (commercial, military, training, etc.)
💥 Cause/description refinement to improve textual analysis usability
🏷️ Tagging & classification based on incident severity, aircraft type, etc.
📊 Columns in cleaned_data.csv(this is combination of all databased ,ready to work on) Below is a typical structure of the dataset:
Column Name Description Date :Date of the incident Location :City/Region/Country of the crash Operator :Airline or aircraft operator Flight No :Flight number (if available) Aircraft Type :Type/model of the aircraft Registration :Aircraft registration number Fatalities :Total number of fatalities Aboard :Total number of people on board Ground Fatalities :Number of people killed on the ground (if any) Summary :Short description or probable cause Source :Original source from which the data point was collected Crash Type :Categorized tag: e.g., Mid-air collision, engine failure, pilot error, etc. Year :Extracted year (useful for trend analysis)
Note: Not all columns are present in each original file; where possible, missing data has been filled or marked appropriately.
🔍 Why This Dataset Is Unique 📅 Over a century of aviation data (1919–2025)
🔄 Merged from five reputable sources
🧼 Thorough manual cleaning and validation
📚 Useful for:
Aviation safety analysis
Time-series forecasting
Natural Language Processing (NLP) on crash summaries
Machine learning (e.g., predicting crash causes or fatalities)
📌 Suggested Use Cases ✈️ Predictive modeling of aviation risk
📉 Trend analysis in global air safety
🗺️ Geographic visualization of accident hotspots
🤖 NLP classification of crash summaries
📊 Dashboard creation in Power BI or Tableau
📁 File Included cleaned_data.csv – Final cleaned dataset with unified schema