Editing Survival analysis (section)

===Tree-structured survival models===

The Cox PH regression model is a linear model. It is similar to linear regression and logistic regression. Specifically, these methods assume that a single line, curve, plane, or surface is sufficient to separate groups (alive, dead) or to estimate a quantitative response (survival time).

In some cases alternative partitions give more accurate classification or quantitative estimates. One set of alternative methods are tree-structured survival models,<ref>{{Cite journal|last=Segal|first=Mark Robert|date=1988|title=Regression Trees for Censored Data|url=https://www.jstor.org/stable/2531894|journal=Biometrics|volume=44|issue=1|pages=35–47|doi=10.2307/2531894|jstor=2531894|s2cid=60974957 |url-access=subscription}}</ref><ref>{{Cite journal|last1=Leblanc|first1=Michael|last2=Crowley|first2=John|date=1993|title=Survival Trees by Goodness of Split|url=http://www.tandfonline.com/doi/abs/10.1080/01621459.1993.10476296|journal=Journal of the American Statistical Association|language=en|volume=88|issue=422|pages=457–467|doi=10.1080/01621459.1993.10476296|issn=0162-1459|url-access=subscription}}</ref><ref>{{Cite journal|last1=Ritschard|first1=Gilbert|last2=Gabadinho|first2=Alexis|last3=Muller|first3=Nicolas S.|last4=Studer|first4=Matthias|date=2008|title=Mining event histories: a social science perspective|url=http://www.inderscience.com/link.php?id=22538|journal=International Journal of Data Mining, Modelling and Management|language=en|volume=1|issue=1|pages=68|doi=10.1504/IJDMMM.2008.022538|issn=1759-1163}}</ref> including survival random forests.<ref name=":0">{{Cite journal|last1=Ishwaran|first1=Hemant|last2=Kogalur|first2=Udaya B.|last3=Blackstone|first3=Eugene H.|last4=Lauer|first4=Michael S.|date=2008-09-01|title=Random survival forests|journal=The Annals of Applied Statistics|volume=2|issue=3|doi=10.1214/08-AOAS169|s2cid=2003897|issn=1932-6157|doi-access=free|arxiv=0811.1645}}</ref> Tree-structured survival models may give more accurate predictions than Cox models. Examining both types of models for a given data set is a reasonable strategy.

====Example survival tree analysis====

This example of a survival tree analysis uses the R{{nbsp}}package "rpart".<ref name=":1">{{Cite web|last1=Therneau|first1=Terry J.|last2=Atkinson|first2=Elizabeth J.|title=rpart: Recursive Partitioning and Regression Trees|url=https://CRAN.R-project.org/package=rpart|access-date=November 12, 2021|website=CRAN}}</ref> The example is based on 146 stage{{nbsp}}C prostate cancer patients in the data set stagec in rpart. Rpart and the stagec example are described in Atkinson and Therneau (1997),<ref>{{Cite book|last1=Atkinson|first1=Elizabeth J.|url=https://www.researchgate.net/publication/235665541|title=An introduction to recursive partitioning using the RPART routines|last2=Therneau|first2=Terry J.|publisher=Mayo Foundation|year=1997}}</ref> which is also distributed as a vignette of the rpart package.<ref name=":1" />

The variables in stages are:
*'''pgtime''': time to progression, or last follow-up free of progression
*'''pgstat''': status at last follow-up (1=progressed, 0=censored)
*'''age''': age at diagnosis
*'''eet''': early endocrine therapy (1=no, 0=yes)
*'''ploidy''': diploid/tetraploid/aneuploid DNA pattern
*'''g2''': % of cells in G2 phase
*'''grade''': tumor grade (1-4)
*'''gleason''': Gleason grade (3-10)

The survival tree produced by the analysis is shown in the figure.

[[File:Survival tree for prostate cancer.png|thumb|700px|Survival tree for prostate cancer data set]]

Each branch in the tree indicates a split on the value of a variable. For example, the root of the tree splits subjects with grade < 2.5 versus subjects with grade 2.5 or greater. The terminal nodes indicate the number of subjects in the node, the number of subjects who have events, and the relative event rate compared to the root. In the node on the far left, the values 1/33 indicate that one of the 33 subjects in the node had an event, and that the relative event rate is 0.122. In the node on the far right bottom, the values 11/15 indicate that 11 of 15 subjects in the node had an event, and the relative event rate is 2.7.

====Survival random forests====

An alternative to building a single survival tree is to build many survival trees, where each tree is constructed using a sample of the data, and average the trees to predict survival.<ref name=":0" /> This is the method underlying the survival random forest models. Survival random forest analysis is available in the R{{nbsp}}package "randomForestSRC".<ref>{{Cite web|last1=Ishwaran|first1=Hemant|last2=Kogalur|first2=Udaya B.|title=randomForestSRC: Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)|url=https://CRAN.R-project.org/package=randomForestSRC|access-date=November 12, 2021|website=CRAN}}</ref>

The randomForestSRC package includes an example survival random forest analysis using the data set pbc. This data is from the Mayo Clinic Primary Biliary Cirrhosis (PBC) trial of the liver conducted between 1974 and 1984. In the example, the random forest survival model gives more accurate predictions of survival than the Cox PH model. The prediction errors are estimated by [[Bootstrapping (statistics)|bootstrap re-sampling]].