Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Symbolic data have become increasingly popular in the era of big data. In this paper, we consider density estimation and regression for interval-valued data, a special type of symbolic data, common in astronomy and official statistics. We propose kernel estimators with adaptive bandwidths to account for variability of each interval. Specifically, we derive cross-validation bandwidth selectors for density estimation and extend the Nadaraya–Watson estimator for regression with interval data. We assess the performance of the proposed methods in comparison with existing kernel methods by extensive simulation studies and real data analysis.
This table shows overall ATCEMS response interval performance for entire fiscal years. Data in the table is broken out by incident response priority and service area (City of Austin or Travis County).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This RR interval dataset is derived from 10,000 cases of 24-hour Holter monitoring data sampled at 128 Hz. Among the cases, 9,500 are labeled as non-atrial fibrillation (NAF), and 500 as paroxysmal atrial fibrillation (PAF). These data have been used in the article "Clinician-AI Collaboration: A Win-Win solution for Efficiency and Reliability in Atrial Fibrillation Diagnosis".The dataset formated as CSV file consists of two columns:rr_interval: Represents the interval between consecutive R-peaks, measured in milliseconds.label: Categorical labels for the beats, where:1 indicates AF0 indicates NAF-1 indicates noise or artifactsEach case is named based on its category. NAF cases are labeled as NAF0001.csv through NAF9500.csv, while PAF cases are labeled as PAF0001.csv through PAF0500.csv.For any questions, please contact the email: hustzp@hust.edu.cn
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicted water level heights at Abbot Point at regular time intervals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicted water level heights at Portland Roads at regular time intervals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicted water level heights at Bowen at regular time intervals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Motivated by a breast cancer study, we consider regression analysis of interval-censored failure time data in the presence of a random change point. Although a great deal of literature on interval-censored data has been established, there does not seem to exist an established method that can allow for the existence of random change points. Such data can occur in, for example, clinical trials where the risk of a disease may dramatically change when some biological indexes of the human body exceed certain thresholds. To fill the gap, we will first consider regression analysis of such data under a class of linear transformation models and provide a sieve maximum likelihood estimation procedure. Then a penalized method is proposed for simultaneous estimation and variable selection, and the asymptotic properties of the proposed method are established. An extensive simulation study is conducted and indicates that the proposed methods work well in practical situations. The approaches are applied to the real data from the breast cancer study mentioned above. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the rapid development of data acquisition and storage space, massive datasets exhibited with large sample size emerge increasingly and make more advanced statistical tools urgently need. To accommodate such big volume in the analysis, a variety of methods have been proposed in the circumstances of complete or right censored survival data. However, existing development of big data methodology has not attended to interval-censored outcomes, which are ubiquitous in cross-sectional or periodical follow-up studies. In this work, we propose an easily implemented divide-and-combine approach for analyzing massive interval-censored survival data under the additive hazards model. We establish the asymptotic properties of the proposed estimator, including the consistency and asymptotic normality. In addition, the divide-and-combine estimator is shown to be asymptotically equivalent to the full-data-based estimator obtained from analyzing all data together. Simulation studies suggest that, relative to the full-data-based approach, the proposed divide-and-combine approach has desirable advantage in terms of computation time, making it more applicable to large-scale data analysis. An application to a set of interval-censored data also demonstrates the practical utility of the proposed method.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicted water level heights at Scarborough at regular time intervals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicted water level heights at Southport at regular time intervals.\r
This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a Wilson interval for binomial proportions for use with multiple imputation for missing data. Using simulation studies, we show that it can have better repeated sampling properties than the usual confidence interval for binomial proportions based on Rubin’s combining rules. Further, in contrast to the usual multiple imputation confidence interval for proportions, the multiple imputation Wilson interval is always bounded by zero and one. Supplementary material is available online.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Predicted water level heights at Cairns at regular time intervals.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Predicted water level heights at Mackay at regular time intervals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicted water level heights at Port Alma at regular time intervals. Predicted water level heights at Port Alma at regular time intervals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicted water level heights at Rosslyn Bay at regular time intervals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicted water level heights at Hay Point at regular time intervals.\r
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Predicted water level heights at Gladstone Auckland Point at regular time intervals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data Snooping (DS) is the best-established method to identify gross errors (outliers) in geodetic data analysis, with a given probability. The power of the test is the probability of DS correctly identifying a gross error, while the confidence interval is the probability of DS not rejecting an observation uncontaminated by gross error. In practice, the power of the test is always unknown. Thus, the objective of this paper is to present a theoretical review of how to determine the minimum power of the test, and bound values for the confidence interval of the DS procedure in an n-dimensional scenario, i.e., considering all observations involved. Along with the theoretical review, a numerical example involving a simulated leveling network is presented. The results obtained in the experiments agreed with the previously calculated theoretical values, i.e., the revised methodology showed satisfactory performance in practice. The example also shows the importance of the revised methodology in the planning stage (or pre-analysis) of geodetic networks
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicted water level heights at Townsville at regular time intervals.\r
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Symbolic data have become increasingly popular in the era of big data. In this paper, we consider density estimation and regression for interval-valued data, a special type of symbolic data, common in astronomy and official statistics. We propose kernel estimators with adaptive bandwidths to account for variability of each interval. Specifically, we derive cross-validation bandwidth selectors for density estimation and extend the Nadaraya–Watson estimator for regression with interval data. We assess the performance of the proposed methods in comparison with existing kernel methods by extensive simulation studies and real data analysis.