5 datasets found
  1. Optimal Alarm Systems - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Optimal Alarm Systems - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/optimal-alarm-systems
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    An optimal alarm system is simply an optimal level-crossing predictor that can be designed to elicit the fewest false alarms for a fixed detection probability. It currently use Kalman filtering for dynamic systems to provide a layer of predictive capability for the forecasting of adverse events. Predicted Kalman filter future process values and a fixed critical threshold can be used to construct a candidate level-crossing event over a predetermined prediction window. Due to the fact that the alarm regions for an optimal level-crossing predictor cannot be expressed in closed form, one of our aims has been to investigate approximations for the design of an optimal alarm system. Approximations to this sort of alarm region are required for the most computationally efficient generation of a ROC curve or other similar alarm system design metrics. Algorithms based upon the optimal alarm system concept also require models that appeal to a variety of data mining and machine learning techniques. As such, we have investigated a serial architecture which was used to preprocess a full feature space by using SVR (Support Vector Regression), implicitly reducing it to a univariate signal while retaining salient dynamic characteristics (see AIAA attachment below). This step was required due to current technical constraints, and is performed by using the residual generated by SVR (or potentially any regression algorithm) that has properties which are favorable for use as training data to learn the parameters of a linear dynamical system. Future development will lift these restrictions so as to allow for exposure to a broader class of models such as a switched multi-input/output linear dynamical system in isolation based upon heterogeneous (both discrete and continuous) data, obviating the need for the use of a preprocessing regression algorithm in serial. However, the use of a preprocessing multi-input/output nonlinear regression algorithm in serial with a multi-input/output linear dynamical system will allow for the characterization of underlying static nonlinearities to be investigated as well. We will even investigate the use of non-parametric methods such as Gaussian process regression and particle filtering in isolation to lift the linear and Gaussian assumptions which may be invalid for many applications. Future work will also involve improvement of approximations inherent in use of the optimal alarm system of optimal level-crossing predictor. We will also perform more rigorous testing and validation of the alarm systems discussed by using standard machine learning techniques and consider more complex, yet practically meaningful critical level-crossing events. Finally, a more detailed investigation of model fidelity with respect to available data and metrics has been conducted (see attachment below). As such, future work on modeling will involve the investigation of necessary improvements in initialization techniques and data transformations for a more feasible fit to the assumed model structure. Additionally, we will explore the integration of physics-based and data-driven methods in a Bayesian context, by using a more informative prior.

  2. Winter Olympics Prediction - Fantasy Draft Picks

    • kaggle.com
    zip
    Updated Jan 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EricSBrown (2022). Winter Olympics Prediction - Fantasy Draft Picks [Dataset]. https://www.kaggle.com/datasets/ericsbrown/winter-olympics-prediction-fantasy-draft-picks
    Explore at:
    zip(4928 bytes)Available download formats
    Dataset updated
    Jan 19, 2022
    Authors
    EricSBrown
    Description

    Olympic Draft Predictive Model

    Our family runs an Olympic Draft - similar to fantasy football or baseball - for each Olympic cycle. The purpose of this case study is to identify trends in medal count / point value to create a predictive analysis of which teams should be selected in which order.

    There are a few assumptions that will impact the final analysis: Point Value - Each medal is worth the following: Gold - 6 points Silver - 4 points Bronze - 3 points For analysis reviewing the last 10 Olympic cycles. Winter Olympics only.

    All GDP numbers are in USD

    My initial hypothesis is that larger GDP per capita and size of contingency are correlated with better points values for the Olympic draft.

    All Data pulled from the following Datasets:

    Winter Olympics Medal Count - https://www.kaggle.com/ramontanoeiro/winter-olympic-medals-1924-2018 Worldwide GDP History - https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?end=2020&start=1984&view=chart

    GDP data was a wide format when downloaded from the World Bank. Opened file in Excel, removed irrelevant years, and saved as .csv.

    Process

    In RStudio utilized the following code to convert wide data to long:

    install.packages("tidyverse") library(tidyverse) library(tidyr)

    Converting to long data from wide

    long <- newgdpdata %>% gather(year, value, -c("Country Name","Country Code"))

    Completed these same steps for GDP per capita.

    Primary Key Creation

    Differing types of data between these two databases and there is not a good primary key to utilize. Used CONCAT to create a new key column in both combining the year and country code to create a unique identifier that matches between the datasets.

    SELECT *, CONCAT(year,country_code) AS "Primary" FROM medal_count

    Saved as new table "medals_w_primary"

    Utilized Excel to concatenate the primary key for GDP and GDP per capita utilizing:

    =CONCAT()

    Saved as new csv files.

    Uploaded all to SSMS.

    Contingent Size

    Next need to add contingent size.

    No existing database had this information. Pulled data from Wikipedia.

    2018 - No problem, pulled existing table. 2014 - Table was not created. Pulled information into excel, needed to convert the country NAMES into the country CODES.

    Created excel document with all ISO Country Codes. Items were broken down between both formats, either 2 or 3 letters. Example:

    AF/AFG

    Used =RIGHT(C1,3) to extract only the country codes.

    For the country participants list in 2014, copied source data from Wikipedia and pasted as plain text (not HTML).

    Items then showed as: Albania (2)

    Broke cells using "(" as the delimiter to separate country names and numbers, then find and replace to remove all parenthesis from this data.

    We were left with: Albania 2

    Used VLOOKUP to create correct country code: =VLOOKUP(A1,'Country Codes'!A:D,4,FALSE)

    This worked for almost all items with a few exceptions that didn't match. Based on nature and size of items, manually checked on which items were incorrect.

    Chinese Taipei 3 #N/A Great Britain 56 #N/A Virgin Islands 1 #N/A

    This was relatively easy to fix by adding corresponding line items to the Country Codes sheet to account for future variability in the country code names.

    Copied over to main sheet.

    Repeated this process for additional years.

    Once complete created sheet with all 10 cycles of data. In total there are 731 items.

    Data Cleaning

    Filtered by Country Code since this was an issue early on.

    Found a number of N/A Country Codes:

    Serbia and Montenegro FR Yugoslavia FR Yugoslavia Czechoslovakia Unified Team Yugoslavia Czechoslovakia East Germany West Germany Soviet Union Yugoslavia Czechoslovakia East Germany West Germany Soviet Union Yugoslavia

    Appears to be issues with older codes, Soviet Union block countries especially. Referred to historical data and filled in these country codes manually. Codes found on iso.org.

    Filled all in, one issue that was more difficult is the Unified Team of 1992 and Soviet Union. For simplicity used code for Russia - GDP data does not recognize the Soviet Union, breaks the union down to constituent countries. Using Russia is a reasonable figure for approximations and analysis to attempt to find trends.

    From here created a filter and scanned through the country names to ensure there were no obvious outliers. Found the following:

    Olympic Athletes from Russia[b] -- This is a one-off due to the recent PED controversy for Russia. Amended the Country Code to RUS to more accurately reflect the trends.

    Korea[a] and South Korea -- both were listed in 2018. This is due to the unified Korean team that competed. This is an outlier and does not warrant standing on its own as the 2022 Olympics will not have this team (as of this writing on 01/14/2022). Removed the COR country code item.

    Confirmed Primary Key was created for all entries.

    Ran minimum and maximum years, no...

  3. c

    TrajAir: A General Aviation Trajectory Dataset

    • kilthub.cmu.edu
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jay Patrikar; Brady Moon; Sourish Ghosh; Jean Oh; Sebastian Scherer (2023). TrajAir: A General Aviation Trajectory Dataset [Dataset]. http://doi.org/10.1184/R1/14866251.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Carnegie Mellon University
    Authors
    Jay Patrikar; Brady Moon; Sourish Ghosh; Jean Oh; Sebastian Scherer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General Aviation (GA) comprises all civil flights except scheduled passenger airline services. More than 90% of the roughly 220,000 civil aircraft registered in the United States (US) are GA aircraft. In contrast with airline service aircraft which operate with two pilots in a structured higher-altitude operational envelope, GA aircraft are often individually piloted in a more unstructured lower-altitude environment. This low altitude environment is also where a bulk of the next generation of Uncrewed Aerial Vehicles (UAVs) are expected to operate. These UAVs are expected to seamlessly interact with other UAVs and manned air traffic operating in this shared airspace. Nowhere is this manned-manned and potentially unmanned-manned interaction more pronounced than in low-altitude terminal airspace around airports. Low altitudes, multi-agent close-proximity interactions, dynamically changing conditions, and rapid decision making are hallmarks of this type of airspace as compared to en-route airspace where agents are typically well-separated.This dataset contains aircraft trajectories in an untowered terminal airspace collected over 8 months surrounding the Pittsburgh-Butler Regional Airport [ICAO:KBTP], a single runway GA airport, 10 miles North of the city of Pittsburgh, Pennsylvania. The trajectory data is recorded using an on-site setup that includes an ADS-B receiver. The trajectory data provided spans days from 18 Sept 2020 till 23 Apr 2021 and includes a total of 111 days of data discounting downtime, repairs, and bad weather days with no traffic. Data is collected starting at 1:00 AM local time to 11:00 PM local time. The dataset uses an Automatic Dependent Surveillance-Broadcast (ADS-B) receiver placed within the airport premises to capture the trajectory data. The receiver uses both the 1090 MHz and 978 MHz frequencies to listen to these broadcasts. The ADS-B uses satellite navigation to produce accurate location and timestamp for the targets which is recorded on-site using our custom setup. Weather data during the data collection time period is also included for environmental context. The weather data is obtained post-hoc using the METeorological Aerodrome Reports (METAR) strings generated by the Automated Weather Observing System (AWOS) system at KBTP. The raw METAR string is then appended to the raw trajectory data by matching the closest UTC timestamps.We also provide processed data that filters, interpolates and transforms data from a global frame to an airport-centred inertial frame. The inertial frame is centred at one end of the runway with the x-axis along the runway. Trajectories are filtered with aircrafts under 6000 ft MSL and around a 5km radius around the airport origin. We also remove duplicates and interpolate data every second. The proceed files also contain wind-data; a crucial factor in decision-making; separated in components along and perpendicular to the runway direction.More Information and Supplemental ToolsPlease visit http://theairlab.org/trajair/ for more information.

  4. d

    Optimal Alarm Systems

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Optimal Alarm Systems [Dataset]. https://catalog.data.gov/dataset/optimal-alarm-systems
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    An optimal alarm system is simply an optimal level-crossing predictor that can be designed to elicit the fewest false alarms for a fixed detection probability. It currently use Kalman filtering for dynamic systems to provide a layer of predictive capability for the forecasting of adverse events. Predicted Kalman filter future process values and a fixed critical threshold can be used to construct a candidate level-crossing event over a predetermined prediction window. Due to the fact that the alarm regions for an optimal level-crossing predictor cannot be expressed in closed form, one of our aims has been to investigate approximations for the design of an optimal alarm system. Approximations to this sort of alarm region are required for the most computationally efficient generation of a ROC curve or other similar alarm system design metrics. Algorithms based upon the optimal alarm system concept also require models that appeal to a variety of data mining and machine learning techniques. As such, we have investigated a serial architecture which was used to preprocess a full feature space by using SVR (Support Vector Regression), implicitly reducing it to a univariate signal while retaining salient dynamic characteristics (see AIAA attachment below). This step was required due to current technical constraints, and is performed by using the residual generated by SVR (or potentially any regression algorithm) that has properties which are favorable for use as training data to learn the parameters of a linear dynamical system. Future development will lift these restrictions so as to allow for exposure to a broader class of models such as a switched multi-input/output linear dynamical system in isolation based upon heterogeneous (both discrete and continuous) data, obviating the need for the use of a preprocessing regression algorithm in serial. However, the use of a preprocessing multi-input/output nonlinear regression algorithm in serial with a multi-input/output linear dynamical system will allow for the characterization of underlying static nonlinearities to be investigated as well. We will even investigate the use of non-parametric methods such as Gaussian process regression and particle filtering in isolation to lift the linear and Gaussian assumptions which may be invalid for many applications. Future work will also involve improvement of approximations inherent in use of the optimal alarm system of optimal level-crossing predictor. We will also perform more rigorous testing and validation of the alarm systems discussed by using standard machine learning techniques and consider more complex, yet practically meaningful critical level-crossing events. Finally, a more detailed investigation of model fidelity with respect to available data and metrics has been conducted (see attachment below). As such, future work on modeling will involve the investigation of necessary improvements in initialization techniques and data transformations for a more feasible fit to the assumed model structure. Additionally, we will explore the integration of physics-based and data-driven methods in a Bayesian context, by using a more informative prior.

  5. Data from: Loan Default Dataset

    • kaggle.com
    zip
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2022). Loan Default Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/loan-default-dataset
    Explore at:
    zip(5123932 bytes)Available download formats
    Dataset updated
    Jan 28, 2022
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Loan_Default_Risk_Expectancy_/main/loan.jpg" alt="">

    Description:

    Banks earn a major revenue from lending loans. But it is often associated with risk. The borrower's may default on the loan. To mitigate this issue, the banks have decided to use Machine Learning to overcome this issue. They have collected past data on the loan borrowers & would like you to develop a strong ML Model to classify if any new borrower is likely to default or not.

    The dataset is enormous & consists of multiple deteministic factors like borrowe's income, gender, loan pupose etc. The dataset is subject to strong multicollinearity & empty values. Can you overcome these factors & build a strong classifier to predict defaulters?

    Acknowledgements:

    This dataset has been referred from Kaggle.

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build classification model to predict weather the loan borrower will default or not.
    • Also fine-tune the hyperparameters & compare the evaluation metrics of vaious classification algorithms.
  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
nasa.gov (2025). Optimal Alarm Systems - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/optimal-alarm-systems
Organization logo

Optimal Alarm Systems - Dataset - NASA Open Data Portal

Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description

An optimal alarm system is simply an optimal level-crossing predictor that can be designed to elicit the fewest false alarms for a fixed detection probability. It currently use Kalman filtering for dynamic systems to provide a layer of predictive capability for the forecasting of adverse events. Predicted Kalman filter future process values and a fixed critical threshold can be used to construct a candidate level-crossing event over a predetermined prediction window. Due to the fact that the alarm regions for an optimal level-crossing predictor cannot be expressed in closed form, one of our aims has been to investigate approximations for the design of an optimal alarm system. Approximations to this sort of alarm region are required for the most computationally efficient generation of a ROC curve or other similar alarm system design metrics. Algorithms based upon the optimal alarm system concept also require models that appeal to a variety of data mining and machine learning techniques. As such, we have investigated a serial architecture which was used to preprocess a full feature space by using SVR (Support Vector Regression), implicitly reducing it to a univariate signal while retaining salient dynamic characteristics (see AIAA attachment below). This step was required due to current technical constraints, and is performed by using the residual generated by SVR (or potentially any regression algorithm) that has properties which are favorable for use as training data to learn the parameters of a linear dynamical system. Future development will lift these restrictions so as to allow for exposure to a broader class of models such as a switched multi-input/output linear dynamical system in isolation based upon heterogeneous (both discrete and continuous) data, obviating the need for the use of a preprocessing regression algorithm in serial. However, the use of a preprocessing multi-input/output nonlinear regression algorithm in serial with a multi-input/output linear dynamical system will allow for the characterization of underlying static nonlinearities to be investigated as well. We will even investigate the use of non-parametric methods such as Gaussian process regression and particle filtering in isolation to lift the linear and Gaussian assumptions which may be invalid for many applications. Future work will also involve improvement of approximations inherent in use of the optimal alarm system of optimal level-crossing predictor. We will also perform more rigorous testing and validation of the alarm systems discussed by using standard machine learning techniques and consider more complex, yet practically meaningful critical level-crossing events. Finally, a more detailed investigation of model fidelity with respect to available data and metrics has been conducted (see attachment below). As such, future work on modeling will involve the investigation of necessary improvements in initialization techniques and data transformations for a more feasible fit to the assumed model structure. Additionally, we will explore the integration of physics-based and data-driven methods in a Bayesian context, by using a more informative prior.

Search
Clear search
Close search
Google apps
Main menu