Editing Survival analysis (section)

==Introduction to survival analysis==

Survival analysis is used in several ways:

*To describe the survival times of members of a group 
**[[Life table]]s 
**[[Kaplan–Meier estimator|Kaplan–Meier curves]]
**[[Survival function]]
**[[Failure rate|Hazard function]]
*To compare the survival times of two or more groups 
**[[Log-rank test]]
*To describe the effect of categorical or quantitative variables on survival
**[[Proportional hazards model|Cox proportional hazards regression]]
**Parametric survival models
**Survival trees 
**Survival random forests

===Definitions of common terms in survival analysis===

The following terms are commonly used in survival analyses:

*Event: Death, disease occurrence, disease recurrence, recovery, or other experience of interest
*Time: The time from the beginning of an observation period (such as surgery or beginning treatment) to (i) an event, or (ii) end of the study, or (iii) loss of contact or withdrawal from the study.
*Censoring / Censored observation: Censoring occurs when we have some information about individual survival time, but we do not know the survival time exactly. The subject is censored in the sense that nothing is observed or known about that subject after the time of censoring. A censored subject may or may not have an event after the end of observation time.
*[[Survival function]] S(t): The probability that a subject survives longer than time t.

===Example: Acute myelogenous leukemia survival data===

This example uses the [[Acute myeloid leukemia|Acute Myelogenous Leukemia]] survival data set "aml" from the "survival" package in R. The data set is from Miller (1997)<ref name="Miller1997">{{Citation
|last1= Miller
|first1= Rupert G.
|title= Survival analysis
|year=1997
|publisher= John Wiley & Sons
|isbn=0-471-25218-2
}}
</ref> and the question is whether the standard course of chemotherapy should be extended ('maintained') for additional cycles.

The aml data set sorted by survival time is shown in the box.
{| class="wikitable mw-collapsible"
|+Aml data set sorted by survival time
!observation
!time
(weeks)
!status
!x
|-
|12
|5
|1
|Nonmaintained
|-
|13
|5
|1
|Nonmaintained
|-
|14
|8
|1
|Nonmaintained
|-
|15
|8
|1
|Nonmaintained
|-
|1
|9
|1
|Maintained
|-
|16
|12
|1
|Nonmaintained
|-
|2
|13
|1
|Maintained
|-
|3
|13
|0
|Maintained
|-
|17
|16
|0
|Nonmaintained
|-
|4
|18
|1
|Maintained
|-
|5
|23
|1
|Maintained
|-
|18
|23
|1
|Nonmaintained
|-
|19
|27
|1
|Nonmaintained
|-
|6
|28
|0
|Maintained
|-
|20
|30
|1
|Nonmaintained
|-
|7
|31
|1
|Maintained
|-
|21
|33
|1
|Nonmaintained
|-
|8
|34
|1
|Maintained
|-
|22
|43
|1
|Nonmaintained
|-
|9
|45
|0
|Maintained
|-
|23
|45
|1
|Nonmaintained
|-
|10
|48
|1
|Maintained
|-
|11
|161
|0
|Maintained
|}

* Time is indicated by the variable "time", which is the survival or censoring time
* Event (recurrence of aml cancer) is indicated by the variable "status". 0{{nbsp}}= no event (censored), 1{{nbsp}}= event (recurrence)
* Treatment group: the variable "x" indicates if maintenance chemotherapy was given

The last observation (11), at 161 weeks, is censored. Censoring indicates that the patient did not have an event (no recurrence of aml cancer). Another subject, observation 3, was censored at 13 weeks (indicated by status=0). This subject was in the study for only 13 weeks, and the aml cancer did not recur during those 13 weeks. It is possible that this patient was enrolled near the end of the study, so that they could be observed for only 13 weeks. It is also possible that the patient was enrolled early in the study, but was lost to follow up or withdrew from the study. The table shows that other subjects were censored at 16, 28, and 45 weeks (observations 17, 6, and{{nbsp}}9 with status=0). The remaining subjects all experienced events (recurrence of aml cancer) while in the study. The question of interest is whether recurrence occurs later in maintained patients than in non-maintained patients.

====Kaplan–Meier plot for the aml data====

The [[survival function]] ''S''(''t''), is the probability that a subject survives longer than time ''t''. ''S''(''t'') is theoretically a smooth curve, but it is usually estimated using the [[Kaplan–Meier estimator|Kaplan–Meier]] (KM) curve. The graph shows the KM plot for the aml data and can be interpreted as follows:

*The ''x'' axis is time, from zero (when observation began) to the last observed time point.
*The ''y'' axis is the proportion of subjects surviving. At time zero, 100% of the subjects are alive without an event.
*The solid line (similar to a staircase) shows the progression of event occurrences.
*A vertical drop indicates an event. In the aml table shown above, two subjects had events at five weeks, two had events at eight weeks, one had an event at nine weeks, and so on. These events at five weeks, eight weeks and so on are indicated by the vertical drops in the KM plot at those time points.
*At the far right end of the KM plot there is a tick mark at 161 weeks. The vertical tick mark indicates that a patient was censored at this time. In the aml data table five subjects were censored, at 13, 16, 28, 45 and 161 weeks. There are five tick marks in the KM plot, corresponding to these censored observations.

====Life table for the aml data====

A [[life table]] summarizes survival data in terms of the number of events and the proportion surviving at each event time point. The life table for the aml data, created using the R{{nbsp}}software, is shown.

{| class="wikitable mw-collapsible"
|+ Life Table for the aml Data
|-
! time !! n.risk !! n.event !! survival !! std.err !! lower 95% CI !! upper 95% CI
|-
| 5 || 23 || 2 || 0.913 || 0.0588 || 0.8049 || 1
|-
| 8 || 21 || 2 || 0.8261 || 0.079 || 0.6848 || 0.996
|-
| 9 || 19 || 1 || 0.7826 || 0.086 || 0.631 || 0.971
|-
| 12 || 18 || 1 || 0.7391 || 0.0916 || 0.5798 || 0.942
|-
| 13 || 17 || 1 || 0.6957 || 0.0959 || 0.5309 || 0.912
|-
| 18 || 14 || 1 || 0.646 || 0.1011 || 0.4753 || 0.878
|-
| 23 || 13 || 2 || 0.5466 || 0.1073 || 0.3721 || 0.803
|-
| 27 || 11 || 1 || 0.4969 || 0.1084 || 0.324 || 0.762
|-
| 30 || 9 || 1 || 0.4417 || 0.1095 || 0.2717 || 0.718
|-
| 31 || 8 || 1 || 0.3865 || 0.1089 || 0.2225 || 0.671
|-
| 33 || 7 || 1 || 0.3313 || 0.1064 || 0.1765 || 0.622
|-
| 34 || 6 || 1 || 0.2761 || 0.102 || 0.1338 || 0.569
|-
| 43 || 5 || 1 || 0.2208 || 0.0954 || 0.0947 || 0.515
|-
| 45 || 4 || 1 || 0.1656 || 0.086 || 0.0598 || 0.458
|-
| 48 || 2 || 1 || 0.0828 || 0.0727 || 0.0148 || 0.462
|}

The life table summarizes the events and the proportion surviving at each event time point. The columns in the life table have the following interpretation:

*time gives the time points at which events occur.
*n.risk is the number of subjects at risk immediately before the time point, t. Being "at risk" means that the subject has not had an event before time t, and is not censored before or at time t.
*n.event is the number of subjects who have events at time t.
*survival is the proportion surviving, as determined using the Kaplan–Meier product-limit estimate.
*std.err is the standard error of the estimated survival. The standard error of the Kaplan–Meier product-limit estimate it is calculated using Greenwood's formula, and depends on the number at risk (n.risk in the table), the number of deaths (n.event in the table) and the proportion surviving (survival in the table).
*lower 95% CI and upper 95% CI are the lower and upper 95% confidence bounds for the proportion surviving.

====Log-rank test: Testing for differences in survival in the aml data====

The [[log-rank test]] compares the survival times of two or more groups. This example uses a log-rank test for a difference in survival in the maintained versus non-maintained treatment groups in the aml data. The graph shows KM plots for the aml data broken out by treatment group, which is indicated by the variable "x" in the data.

[[File:Kaplan-Meier by treatment in AML.svg|thumb|320px|Kaplan–Meier graph by treatment group in aml]]

The null hypothesis for a log-rank test is that the groups have the same survival. The expected number of subjects surviving at each time point in each is adjusted for the number of subjects at risk in the groups at each event time. The log-rank test determines if the observed number of events in each group is significantly different from the expected number. The formal test is based on a chi-squared statistic. When the log-rank statistic is large, it is evidence for a difference in the survival times between the groups. The log-rank statistic approximately has a [[Chi-squared distribution]] with one degree of freedom, and the [[p-value]] is calculated using the [[Chi-squared test]].

For the example data, the log-rank test for difference in survival gives a p-value of p=0.0653, indicating that the treatment groups do not differ significantly in survival, assuming an alpha level of 0.05. The sample size of 23 subjects is modest, so there is little [[Power of a test|power]] to detect differences between the treatment groups. The chi-squared test is based on asymptotic approximation, so the p-value should be regarded with caution for small [[Sample size determination|sample sizes]].

===Cox proportional hazards (PH) regression analysis===

Kaplan–Meier curves and log-rank tests are most useful when the predictor variable is categorical (e.g., drug vs. placebo), or takes a small number of values (e.g., drug doses 0, 20, 50, and 100&nbsp;mg/day) that can be treated as categorical. The log-rank test and KM curves don't work easily with quantitative predictors such as gene expression, white blood count, or age. For quantitative predictor variables, an alternative method is [[Proportional hazards model#The Cox model|Cox proportional hazards regression]] analysis. Cox PH models work also with categorical predictor variables, which are encoded as {0,1} indicator or dummy variables. The log-rank test is a special case of a Cox PH analysis, and can be performed using Cox PH software.

====Example: Cox proportional hazards regression analysis for melanoma====

This example uses the melanoma data set from Dalgaard Chapter 14.
<ref name="Dalgaard2008">{{Citation
|last1= Dalgaard
|first1= Peter
|title= Introductory Statistics with R
|edition=Second
|year=2008
|publisher= Springer
|isbn= 978-0387790534
}}
</ref>

Data are in the R package ISwR. The Cox proportional hazards regression using{{nbsp}}R gives the results shown in the box.
	
 [[File:Cox proportional hazards regression output for melanoma data set.png|thumb|400px|right|Cox proportional hazards regression output for melanoma data. Predictor variable is sex 1: female, 2: male.]]

The Cox regression results are interpreted as follows.

*Sex is encoded as a numeric vector (1: female, 2: male). The R{{nbsp}}summary for the Cox model gives the hazard ratio (HR) for the second group relative to the first group, that is, male versus female.
*coef = 0.662 is the estimated logarithm of the hazard ratio for males versus females.
*exp(coef) = 1.94 = exp(0.662) - The log of the hazard ratio (coef= 0.662) is transformed to the hazard ratio using exp(coef). The summary for the Cox model gives the hazard ratio for the second group relative to the first group, that is, male versus female. The estimated hazard ratio of 1.94 indicates that males have higher risk of death (lower survival rates) than females, in these data.
*se(coef) = 0.265 is the standard error of the log hazard ratio.
*z = 2.5 = coef/se(coef) = 0.662/0.265. Dividing the coef by its standard error gives the z score.
*p=0.013. The p-value corresponding to z=2.5 for sex is p=0.013, indicating that there is a significant difference in survival as a function of sex.

The summary output also gives upper and lower 95% confidence intervals for the hazard ratio: lower 95% bound = 1.15; upper 95% bound = 3.26.

Finally, the output gives p-values for three alternative tests for overall significance of the model:

*Likelihood ratio test = 6.15 on 1 df, p=0.0131
*Wald test = 6.24 on 1 df, p=0.0125
*Score (log-rank) test = 6.47 on 1 df, p=0.0110

These three tests are asymptotically equivalent. For large enough N, they will give similar results. For small N, they may differ somewhat. The last row, "Score (logrank) test" is the result for the log-rank test, with p=0.011, the same result as the log-rank test, because the log-rank test is a special case of a Cox PH regression. The Likelihood ratio test has better behavior for small sample sizes, so it is generally preferred.

====Cox model using a covariate in the melanoma data====

The Cox model extends the log-rank test by allowing the inclusion of additional covariates.<ref>{{Cite journal |last1=Saegusa |first1=Takumi |last2=Di |first2=Chongzhi |last3=Chen |first3=Ying Qing |date=September 2014 |title=Hypothesis testing for an extended cox model with time-varying coefficients |journal=Biometrics |language=en |volume=70 |issue=3 |pages=619–628 |doi=10.1111/biom.12185 |pmid=24888739 |issn=0006-341X|pmc=4247822 }}</ref> This example use the melanoma data set where the predictor variables include a continuous covariate, the thickness of the tumor (variable name = "thick").

[[File:Histograms of melanoma thickness.png|thumb|700px|Histograms of melanoma tumor thickness]]

In the histograms, the thickness values are [[Skewness|positively skewed]] and do not have a [[Normal distribution|Gaussian]]-like, [[Symmetric probability distribution]]. Regression models, including the Cox model, generally give more reliable results with normally-distributed variables.{{Citation needed|date=February 2023}} For this example we may use a [[logarithm]]ic transform. The log of the thickness of the tumor looks to be more normally distributed, so the Cox models will use log thickness. The Cox PH analysis gives the results in the box.

[[File:Cox PH output for melanoma with thickness.png|thumb|500px|Cox PH output for melanoma data set with covariate log tumor thickness]]

The p-value for all three overall tests (likelihood, Wald, and score) are significant, indicating that the model is significant. The p-value for log(thick) is 6.9e-07, with a hazard ratio HR = exp(coef) = 2.18, indicating a strong relationship between the thickness of the tumor and increased risk of death.

By contrast, the p-value for sex is now p=0.088. The hazard ratio HR = exp(coef) = 1.58, with a 95% confidence interval of 0.934 to 2.68. Because the confidence interval for HR includes 1, these results indicate that sex makes a smaller contribution to the difference in the HR after controlling for the thickness of the tumor, and only trend toward significance. Examination of graphs of log(thickness) by sex and a t-test of log(thickness) by sex both indicate that there is a significant difference between men and women in the thickness of the tumor when they first see the clinician.

The Cox model assumes that the hazards are proportional. The proportional hazard assumption may be tested using the R{{nbsp}}function cox.zph(). A p-value which is less than 0.05 indicates that the hazards are not proportional. For the melanoma data we obtain p=0.222. Hence, we cannot reject the null hypothesis of the hazards being proportional. Additional tests and graphs for examining a Cox model are described in the textbooks cited.

====Extensions to Cox models====

Cox models can be extended to deal with variations on the simple analysis. 
	
*Stratification. The subjects can be divided into strata, where subjects within a stratum are expected to be relatively more similar to each other than to randomly chosen subjects from other strata. The regression parameters are assumed to be the same across the strata, but a different baseline hazard may exist for each stratum. Stratification is useful for analyses using matched subjects, for dealing with patient subsets, such as different clinics, and for dealing with violations of the proportional hazard assumption.
*Time-varying covariates. Some variables, such as gender and treatment group, generally stay the same in a clinical trial. Other clinical variables, such as serum protein levels or dose of concomitant medications may change over the course of a study. Cox models may be extended for such time-varying covariates.

===Tree-structured survival models===

The Cox PH regression model is a linear model. It is similar to linear regression and logistic regression. Specifically, these methods assume that a single line, curve, plane, or surface is sufficient to separate groups (alive, dead) or to estimate a quantitative response (survival time).

In some cases alternative partitions give more accurate classification or quantitative estimates. One set of alternative methods are tree-structured survival models,<ref>{{Cite journal|last=Segal|first=Mark Robert|date=1988|title=Regression Trees for Censored Data|url=https://www.jstor.org/stable/2531894|journal=Biometrics|volume=44|issue=1|pages=35–47|doi=10.2307/2531894|jstor=2531894|s2cid=60974957 |url-access=subscription}}</ref><ref>{{Cite journal|last1=Leblanc|first1=Michael|last2=Crowley|first2=John|date=1993|title=Survival Trees by Goodness of Split|url=http://www.tandfonline.com/doi/abs/10.1080/01621459.1993.10476296|journal=Journal of the American Statistical Association|language=en|volume=88|issue=422|pages=457–467|doi=10.1080/01621459.1993.10476296|issn=0162-1459|url-access=subscription}}</ref><ref>{{Cite journal|last1=Ritschard|first1=Gilbert|last2=Gabadinho|first2=Alexis|last3=Muller|first3=Nicolas S.|last4=Studer|first4=Matthias|date=2008|title=Mining event histories: a social science perspective|url=http://www.inderscience.com/link.php?id=22538|journal=International Journal of Data Mining, Modelling and Management|language=en|volume=1|issue=1|pages=68|doi=10.1504/IJDMMM.2008.022538|issn=1759-1163}}</ref> including survival random forests.<ref name=":0">{{Cite journal|last1=Ishwaran|first1=Hemant|last2=Kogalur|first2=Udaya B.|last3=Blackstone|first3=Eugene H.|last4=Lauer|first4=Michael S.|date=2008-09-01|title=Random survival forests|journal=The Annals of Applied Statistics|volume=2|issue=3|doi=10.1214/08-AOAS169|s2cid=2003897|issn=1932-6157|doi-access=free|arxiv=0811.1645}}</ref> Tree-structured survival models may give more accurate predictions than Cox models. Examining both types of models for a given data set is a reasonable strategy.

====Example survival tree analysis====

This example of a survival tree analysis uses the R{{nbsp}}package "rpart".<ref name=":1">{{Cite web|last1=Therneau|first1=Terry J.|last2=Atkinson|first2=Elizabeth J.|title=rpart: Recursive Partitioning and Regression Trees|url=https://CRAN.R-project.org/package=rpart|access-date=November 12, 2021|website=CRAN}}</ref> The example is based on 146 stage{{nbsp}}C prostate cancer patients in the data set stagec in rpart. Rpart and the stagec example are described in Atkinson and Therneau (1997),<ref>{{Cite book|last1=Atkinson|first1=Elizabeth J.|url=https://www.researchgate.net/publication/235665541|title=An introduction to recursive partitioning using the RPART routines|last2=Therneau|first2=Terry J.|publisher=Mayo Foundation|year=1997}}</ref> which is also distributed as a vignette of the rpart package.<ref name=":1" />

The variables in stages are:
*'''pgtime''': time to progression, or last follow-up free of progression
*'''pgstat''': status at last follow-up (1=progressed, 0=censored)
*'''age''': age at diagnosis
*'''eet''': early endocrine therapy (1=no, 0=yes)
*'''ploidy''': diploid/tetraploid/aneuploid DNA pattern
*'''g2''': % of cells in G2 phase
*'''grade''': tumor grade (1-4)
*'''gleason''': Gleason grade (3-10)

The survival tree produced by the analysis is shown in the figure.

[[File:Survival tree for prostate cancer.png|thumb|700px|Survival tree for prostate cancer data set]]

Each branch in the tree indicates a split on the value of a variable. For example, the root of the tree splits subjects with grade < 2.5 versus subjects with grade 2.5 or greater. The terminal nodes indicate the number of subjects in the node, the number of subjects who have events, and the relative event rate compared to the root. In the node on the far left, the values 1/33 indicate that one of the 33 subjects in the node had an event, and that the relative event rate is 0.122. In the node on the far right bottom, the values 11/15 indicate that 11 of 15 subjects in the node had an event, and the relative event rate is 2.7.

====Survival random forests====

An alternative to building a single survival tree is to build many survival trees, where each tree is constructed using a sample of the data, and average the trees to predict survival.<ref name=":0" /> This is the method underlying the survival random forest models. Survival random forest analysis is available in the R{{nbsp}}package "randomForestSRC".<ref>{{Cite web|last1=Ishwaran|first1=Hemant|last2=Kogalur|first2=Udaya B.|title=randomForestSRC: Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)|url=https://CRAN.R-project.org/package=randomForestSRC|access-date=November 12, 2021|website=CRAN}}</ref>

The randomForestSRC package includes an example survival random forest analysis using the data set pbc. This data is from the Mayo Clinic Primary Biliary Cirrhosis (PBC) trial of the liver conducted between 1974 and 1984. In the example, the random forest survival model gives more accurate predictions of survival than the Cox PH model. The prediction errors are estimated by [[Bootstrapping (statistics)|bootstrap re-sampling]].

===Deep Learning survival models===

Recent advancements in deep representation learning have been extended to survival estimation. The DeepSurv<ref>{{cite journal |last1=Singh |first1=Jared |last2= Katzman |first2=L. |title= DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network |journal=[[BMC Medical Research Methodology]] |year=2018}}</ref> model proposes to replace the log-linear parameterization of the CoxPH model with a multi-layer perceptron. Further extensions like Deep Survival Machines<ref>{{cite journal |last=Nagpal |first=Chirag |title=Deep survival machines: Fully parametric survival regression and representation learning for censored data with competing risks. |journal= IEEE Journal of Biomedical and Health Informatics |volume=25 |year=2021 |issue=8|pages=3163–3175 |doi=10.1109/JBHI.2021.3052441 |pmid=33460387 |arxiv=2003.01176 |s2cid=211817982 }}</ref> and Deep Cox Mixtures<ref>{{cite journal |last=Nagpal |first=Chirag |title= Deep Cox mixtures for survival regression. |journal= Machine Learning for Healthcare Conference |year=2021 |arxiv=2101.06536 }}</ref> involve the use of latent variable mixture models to model the time-to-event distribution as a mixture of parametric or semi-parametric distributions while jointly learning representations of the input covariates. Deep learning approaches have shown superior performance especially on complex input data modalities such as images and clinical time-series.