This is a made-up dataset in the context of a test group vs placebo group study that is used in a report introducing the Kaplan-Meier estimation and the Cox proportional hazards model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This example dataset is used to illustrate the usage of the R package survtd in the Supplementary Materials of the paper:Moreno-Betancur M, Carlin JB, Brilleman SL, Tanamas S, Peeters A, Wolfe R (2017). Survival analysis with time-dependent covariates subject to measurement error and missing data: Two-stage joint model using multiple imputation (submitted).The data was generated using the simjm function of the package, using the following code:dat
The dataset contains cardiovascular medical records taken from 299 patients. The patient cohort comprised of 105 women and 194 men between 40 and 95 years in age. All patients in the cohort were diagnosed with the systolic dysfunction of the left ventricle and had previous history of heart failures. As a result of their previous history every patient was classified into either class III or class IV of New York Heart Association (NYHA) classification for various stages of heart failure.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Nonproportional hazards models often arise in biomedical studies, as evidenced by a recent national kidney transplant study. During the follow-up, the effects of baseline risk factors, such as patients’ comorbidity conditions collected at transplantation, may vary over time. To model such dynamic changes of covariate effects, time-varying survival models have emerged as powerful tools. However, traditional methods of fitting time-varying effects survival model rely on an expansion of the original dataset in a repeated measurement format, which, even with a moderate sample size, leads to an extremely large working dataset. Consequently, the computational burden increases quickly as the sample size grows, and analyses of a large dataset such as our motivating example defy any existing statistical methods and software. We propose a novel application of quasi-Newton iteration method to model time-varying effects in survival analysis. We show that the algorithm converges superlinearly and is computationally efficient for large-scale datasets. We apply the proposed methods, via a stratified procedure, to analyze the national kidney transplant data and study the impact of potential risk factors on post-transplant survival. Supplementary materials for this article are available online.
C. gouldii mark-recapture data setMark-recapture data file needed to run Bayesian survival analysis for Chalinolobus gouldii example, as detailed in Supplement 4, page 8.CG_Bat_Dat.RData
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Cox models are commonly used in the analysis of time to event data. One advantage of Cox models is the ability to include time-varying covariates, often a binary covariate that codes for the occurrence of an event that affects an individual subject. A common assumption in this case is that the effect of the event on the outcome of interest is constant and permanent for each subject. In this paper we propose a modification to the Cox model to allow the influence of an event to exponentially decay over time. Methods for generating data using the inverse cumulative density function for the proposed model are developed. Likelihood ratio tests and AIC are investigated as methods for comparing the proposed model to the commonly used permanent exposure model. A more general model proposed by Cox and Oakes [1] is also discussed. A simulation study is performed and three different data sets are presented as examples.
Program RDSURVIV (Robust-Design-SURVIVal analysis) computes parameter estimates of survival and capture probability and temporary emigration using models described in ''Estimating Temporary Emigration using Capture-recapture Data with Pollock's Robust'' (Kendall et. al., 1997). Actually, RDSURVIV is a specially modified version of Dr. G. White's program SURVIV (White, 1983) which incorporates the robust-design models. With this program and it's companion program, CNVRDSRV, users are able to get parameter estimates for these complex models from capture-history data without having to specify the cell probabilities. This program/method should be used in cases where a significant portion of the sampled population is unavailable for capture during some of the sampling periods. Ignoring the situation by using standard open model Jolly-Seber analysis will result in biased estimates of population size and capture probability. For example, if the trapping area only allows the capture of breeders, then the animals which are non-breeders in a particular sample are ''temporary emigrants'' for that sample since their probability of capture is zero.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The largest working dataset on leader survival, Archigos 4.1 (Goemans, Gleditsch, & Chiozza, 2009), focuses on the violent, dramatic means by which leaders may “exit” office. This information is vital for many research questions and its collection constitutes a valuable public good for the community. Yet, it provides an incomplete picture of the political rise and fall of world leaders. The burgeoning study of leaders using survival analysis requires a fine-grained understanding of not just when, but why and how leaders exit our datasets. We cannot, for example, conclude that a leader’s exit implies a successful application of international pressure if her removal stems from pre-set constitutional laws and the immediate successor has long been considered the heir apparent. The Regular Turnover Details dataset remedies this problem, as well as others. Two principle variables, Means and Successor, report information about the manner of each leader’s exit and the relationship between outgoing and incoming leaders. Together with supporting information about political pressure and apolitical figures this data allows analysts to arbiter between exits that suggest political failure and those that do not, identify nonpolitical leaders (such as interim and technocratic executives), and determine whether leaders constitute heirs to power or challengers thereof.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code Examples from the Book
timedepsurvival18Sept2018Text file containing data values for the analysis of embryonal survival in different stages. Please consult the R script "survivalscript19Sept2018.Rhistory" for explanations and an example.survivalscript19Sept2018R script for survival analysis. With explanations of the data file "timedepsurvival18Sept2018.txt"timedepdevelopmentSept2018Data file with intervals for the analysis of developmental rates. Variable names and an example can be found in "developmentscript19Sept2018.Rhistory".developmentscript19Sept2018R script with explanations of variable names in "timedepdevelopmentSept2018.txt" and an example of Cox mixed model fitting.atrewettingdata19Septhatchingdata20Sept2018Data on hatching probabilities.secondhatching20Sept2018Data on hatching probabilities at second rewetting
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is the example data used for the paper by Ramjith et al. (2022) in which they describe flexible time-to-event models for double-interval-censored data with a competing risks. The survival times are thus specified by two intervals (L0time, R0time] and (L1time, R1time]. There is a state indicator for right-censoring (state=1), gametocyte initiation (state=2) and malaria clearance (state=3). There is also a special overlap indicator to indicate whether gametocytes were detected at the moment of detection of parasites. All covariates analyzed are included here; from host characteristics to parasite characteristics.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Causal mediation analysis studies how the treatment effect of an exposure on outcomes is mediated through intermediate variables. Although many applications involve longitudinal data, the existing methods are not directly applicable to settings where the mediators are measured on irregular time grids. In this paper, we propose a causal mediation method that accommodates longitudinal mediators on arbitrary time grids and survival outcomes simultaneously. We take a functional data analysis perspective and view longitudinal mediators as realizations of underlying smooth stochastic processes. We define causal estimands of direct and indirect effects accordingly and provide corresponding identification assumptions. We employ a functional principal component analysis approach to estimate the mediator process, and propose a Cox hazard model for the survival outcome that flexibly adjusts the mediator process. We then derive a g-computation formula to express the causal estimands using the model coefficients. The proposed method is applied to a longitudinal data set from the Amboseli Baboon Research Project to investigate the causal relationships between early adversity, adult physiological stress responses, and survival among wild female baboons. We find that adversity experienced in early life has a significant direct effect on females' life expectancy and survival probability, but find little evidence that these effects were mediated by markers of the stress response in adulthood. We further developed a sensitivity analysis method to assess the impact of potential violation to the key assumption of sequential ignorability. ... [Read More]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
University courses in statistical modeling often place great emphasis on methodological theory, illustrating it only briefly by means of limited and repeatedly used standard examples. Unfortunately, this approach often fails to actively engage and motivate students in their learning process. The teaching of statistical topics such as Bayesian survival analysis can be enhanced by focusing on innovative applications. Here, we discuss the visualization and modeling of a dataset of historical events comprising the post-election survival times of popes. Inference, prediction, and model checking are performed in the Bayesian framework, with comparisons being made with the frequentist approach. Further opportunities for similar statistical investigations are outlined. Supplementary materials for this article are available online.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In many applications of survival analysis, the risk of an event occurring for one reason is dependent on the risk of the same event occurring for another reason. For example, when politicians suspect they might lose an election, they may strategically choose to retire. In such situations, the often-used multinomial logit model suffers from bias and underestimates the degree of strategic retirement, for example, to what extent poor prior electoral performance diminishes electoral prospects. To address this problem, the present article proposes a systematically dependent competing-risks (SDCR) model of survival analysis. Unlike the frailty model, the SDCR model can also deal with more than two risks. Monte Carlo simulation demonstrates how much the SDCR model reduces bias. Reanalysis of data on U.S. congressional careers (Box-Steffensmeier and Jones 2004) documents the strategic retirement of representatives, indicating that electoral pressure is more effective at turning out incumbents than previously recognized.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The current approach to modelling trap-dependence is compared to the traditional approach and to the model that ignores trap-dependence in a survival analysis of Cory's shearwaters (from [10]). Because there are transient individuals in this data set, two survival values are estimated: φ1, the apparent survival of newly-marked individuals, which is affected by the presence of transients, and φ2, the survival of previously marked individuals. Capture probability p is time-dependent-only in model (φ1,φ2, pt) and time- and trap-dependent in model (φ1,φ2, pt+m). In this last model, trap and time dependencies are additive. This model was fitted with the current approach, which considers trap-awareness states and with the traditional approach as in ([10] Model 5, Table 2), which involves the special preparation of the data detailed in [12]. The 95% confidence intervals are in parentheses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper studies the parking demand characteristics of large commercial areas in the city’s central regions. The study uses non-parametric and semi-parametric analysis methods in survival analysis to explore if and how weather conditions, parking tariffs, and temporal factors (weekdays, weekends, and short holidays) impact the parking duration. The parking data of a large commercial supermarket in Zhengzhou was collected over one month. Single-factor analysis based on the Product-Limit (PL) approach suggests that the cumulative survival and relative risk curves of parking duration exhibit slight variations across different temporal categories and weather conditions. Based on Cox semi-parametric multi-factor analysis results, the parking duration is significantly influenced by weekdays (regression coefficient = 0.068, hazard ratio = 1.071, P
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Employee Turnover’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/davinwijaya/employee-turnover on 28 January 2022.
--- Dataset description provided by original source is as follows ---
No, it's not about survive from drowning or something like that (just for illustration).
This Employee Turnover dataset is a real dataset shared from Edward Babushkin's blog used to predict an Employee's risk of quitting (with a Survival Analysis Model). Edward Babushkin explained that "Survival Analysis is one of the most importance but it's not the most popular algorithm to predict employee turnover. Analysts use more familiar algorithms like Logistic Regression but, for example, Pasha Roberts writes: 'Don't use logistic methods to predict attrition!'. I think that we can only apply for a short-term situation like whether the employee has worked more or less than three months. If our goal is to predict individual quitting risks, then the best method is Survival Analysis."
All credit goes to Edward Babushkin for sharing this useful dataset.
This dataset can be used for predicting Employee Churn / Employee Turnover, Employee Survival Analysis, Uplift Modeling, or even Uplift Survival Analysis.
--- Original source retains full ownership of the source dataset ---
Ecologists and foresters have long noted a link between tree growth rate and mortality, and recent work suggests that interspecific differences in low growth tolerance is a key force shaping forest structure. Little information is available, however, on the growth-mortality relationship for most species. We present three methods for estimating growth-mortality functions from readily obtainable field data. All use annual mortality rates and the recent growth rates of living and dead individuals. Annual mortality rates are estimated using both survival analysis and a Bayesian approach. Growth rates are obtained from increment cores. Growth-mortality functions are fitted using two parametric approaches and a non-parametric approach. The three methods are compared using bootstrapped confidence intervals and likelihood ratio tests. For two example species, Acer rubrum and Cornus florida, growth-mortality functions indicate a substantial difference in the two species abilities to withstand slow growth. Both survival analysis and Bayesian estimates of mortality rates lead to similar growth-mortality functions, with the Bayesian approach providing a means to overcome the absence of long-term census data. In fitting growth-mortality functions, the non-parametric approach reveals that inflexibility in parametric methods can lead to errors in estimating mortality risk at low growth. We thus suggest that non-parametric fits be used as a tool for assessing parametric models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Biological rhythms allow organisms to compartmentalise and coordinate aspects of their life with the predictable daily rhythms of their environment. There is increasing recognition that understanding the biological rhythms of mosquitoes that transmit parasites is important for global health. For example, perturbations in blood foraging rhythms as a consequence of vector control measures may undermine disease control. To address this, we explore the impacts of altered timing of blood feeding on mosquito life history traits and malaria transmission. We carried out three experiments in which Anopheles stephensi mosquitoes were fed in the morning or evening on blood that had different qualities, including: chemical or Plasmodium chabaudi infection induced anaemia, Plasmodium berghei infection but no anaemia, or originating from hosts at different times of day. We then compared mosquito fitness proxies relating to survival and reproduction, and malaria transmission proxies where relevant, using mixed-effects models and survival analysis. Mosquito lifespan is not influenced by the time of day they received a blood meal but several reproductive metrics are affected, in some experiments. Overall, receiving a blood meal in the morning makes mosquitoes more likely to lay eggs. Furthermore, in the experiment with the largest sample size, morning fed mosquitoes laid sooner and have a larger clutch size. In keeping with previous work, P. berghei infection reduces mosquito lifespan and the likelihood of laying eggs, but time of day of blood feeding does not impact upon these metrics nor on parasite transmission. The time of day of blood feeding does not appear to have major consequences for mosquito fitness or transmission of asynchronous malaria species. If our results from a lab colony of mosquitoes living in benign conditions hold for wild mosquitoes, it suggests that mosquitoes have sufficient flexibility in their physiology to cope with changes in biting time induced by evading insecticide treated bed nets. Future work should consider the impact of multiple feeding cycles and the abiotic stresses imposed by the need to forage for blood when hosts are not protected by bed nets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In epidemiology and clinical research, recurrent events refer to individuals who are likely to experience transient clinical events repeatedly over an observation period. Examples include hospitalizations in patients with heart failure, fractures in osteoporosis studies and the occurrence of new lesions in oncology. We provided an in-depth analysis of the sample size required for the analysis of recurrent time-to-event data using multifrailty or multilevel survival models. We covered the topic from the simple shared frailty model to models with hierarchical or joint frailties. We relied on a Wald-type test statistic to estimate the sample size assuming either a single or multiple endpoints. Simulations revealed that the sample size increased as heterogeneity increased. We also observed that it was more attractive to include more patients and reduce the duration of follow-up than to include fewer patients and increase the duration of follow-up to obtain the number of events required. Each model investigated can address the question of the number of subjects for recurrent events. However, depending on the research question, one model will be more suitable than another. We illustrated our methodology with the AFFIRM-AHF trial investigating the effect of intravenous ferric carboxymaltose in patients hospitalised for acute heart failure.
This is a made-up dataset in the context of a test group vs placebo group study that is used in a report introducing the Kaplan-Meier estimation and the Cox proportional hazards model.