Proportional hazards model

Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes before some event occurs to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one's hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. The accelerated failure time model describes a situation where the biological or mechanical life history of an event is accelerated.

Introduction

Survival models can be viewed as consisting of two parts: the underlying baseline hazard function, often denoted $\lambda _{0}(t)$ , describing how the risk of event per time unit changes over time at baseline levels of covariates; and the effect parameters, describing how the hazard varies in response to explanatory covariates. A typical medical example would include covariates such as treatment assignment, as well as patient characteristics such as age at start of study, gender, and the presence of other diseases at start of study, in order to reduce variability and/or control for confounding.

The proportional hazards condition^[1] states that covariates are multiplicatively related to the hazard. In the simplest case of stationary coefficients, for example, a treatment with a drug may, say, halve a subject's hazard at any given time $t$ , while the baseline hazard may vary. Note however, that this does not double the life time of the subject; the precise effect of the covariates on the life time depends on the type of $\lambda _{0}(t)$ . The covariate is not restricted to binary predictors; in the case of a continuous covariate $x$ , it is typically assumed that the hazard responds exponentially; each unit increase in $x$ results in proportional scaling of the hazard. The Cox partial likelihood, shown below, is obtained by using Breslow's estimate of the baseline hazard function, plugging it into the full likelihood and then observing that the result is a product of two factors. The first factor is the partial likelihood shown below, in which the baseline hazard has "canceled out". The second factor is free of the regression coefficients and depends on the data only through the censoring pattern. The effect of covariates estimated by any proportional hazards model can thus be reported as hazard ratios.

Sir David Cox observed that if the proportional hazards assumption holds (or, is assumed to hold) then it is possible to estimate the effect parameter(s) without any consideration of the hazard function. This approach to survival data is called application of the Cox proportional hazards model,^[2] sometimes abbreviated to Cox model or to proportional hazards model. However, Cox also noted that biological interpretation of the proportional hazards assumption can be quite tricky.^[3] ^[4]

The Cox model

Let Y_i denote the observed time (either censoring time or event time) for subject i. Let C_i be the indicator that the time corresponds to an event (i.e. if C_i = 1 the event occurred and if C_i = 0 the time is a censoring time). Let $X i = {X i 1, \dots X ip}$ be the realized values of the covariates for subject i. The hazard function for the Cox proportional hazard model has the form

\lambda (t|X_{i})=\lambda _{0}(t)\exp(\beta _{1}X_{i1}+\cdots +\beta _{p}X_{ip})=\lambda _{0}(t)\exp(X_{i}\cdot \beta ).

This expression gives the hazard rate at time t for subject i with covariate vector (explanatory variables) X_i.

Ignoring ties for the moment, conditioned upon the existence of a unique event at some particular time $t$ the probability that the event occurs in the subject $i$ for which $C i = 1$ and $Y i = t$ is

L_{i}(\beta )={\frac {\theta _{i}}{\sum _{j:Y_{j}\geq Y_{i}}\theta _{j}}},

where $θ j = exp(X j \cdot β$ ). Observe that the factors of $λ 0 (t)$ that would be present in both the numerator and denominator have canceled out.

Treating the subjects' events as if they were statistically independent, the joint probability of all realized events conditioned upon the existence of events at those times is the partial likelihood:

L(\beta )=\prod _{i:C_{i}=1}{\frac {\theta _{i}}{\sum _{j:Y_{j}\geq Y_{i}}\theta _{j}}},

The corresponding log partial likelihood is

\ell (\beta )=\sum _{i:C_{i}=1}\left(X_{i}\cdot \beta -\log \sum _{j:Y_{j}\geq Y_{i}}\theta _{j}\right).

This function can be maximized over β to produce maximum partial likelihood estimates of the model parameters.

The partial score function is

\ell ^{\prime }(\beta )=\sum _{i:C_{i}=1}\left(X_{i}-{\frac {\sum _{j:Y_{j}\geq Y_{i}}\theta _{j}X_{j}}{\sum _{j:Y_{j}\geq Y_{i}}\theta _{j}}}\right),

and the Hessian matrix of the partial log likelihood is

\ell ^{\prime \prime }(\beta )=-\sum _{i:C_{i}=1}\left({\frac {\sum _{j:Y_{j}\geq Y_{i}}\theta _{j}X_{j}X_{j}^{\prime }}{\sum _{j:Y_{j}\geq Y_{i}}\theta _{j}}}-{\frac {\left[\sum _{j:Y_{j}\geq Y_{i}}\theta _{j}X_{j}\right]\left[\sum _{j:Y_{j}\geq Y_{i}}\theta _{j}X_{j}^{\prime }\right]}{\left[\sum _{j:Y_{j}\geq Y_{i}}\theta _{j}\right]^{2}}}\right).

Using this score function and Hessian matrix, the partial likelihood can be maximized using the Newton-Raphson algorithm. The inverse of the Hessian matrix, evaluated at the estimate of β, can be used as an approximate variance-covariance matrix for the estimate, and used to produce approximate standard errors for the regression coefficients.

Tied times

Several approaches have been proposed to handle situations in which there are ties in the time data. Breslow's method describes the approach in which the procedure described above is used unmodified, even when ties are present. An alternative approach that is considered to give better results is Efron's method.^[5] Let t_j denote the unique times, let H_j denote the set of indices i such that Y_i = t_j and C_i = 1, and let m_j = |H_j|. Efron's approach maximizes the following partial likelihood.

L(\beta )=\prod _{j}{\frac {\prod _{i\in H_{j}}\theta _{i}}{\prod _{\ell =0}^{m-1}[\sum _{i:Y_{i}\geq t_{j}}\theta _{i}-{\frac {\ell }{m}}\sum _{i\in H_{j}}\theta _{i}]}}.

The corresponding log partial likelihood is

\ell (\beta )=\sum _{j}\left(\sum _{i\in H_{j}}X_{i}\cdot \beta -\sum _{\ell =0}^{m-1}\log \left(\sum _{i:Y_{i}\geq t_{j}}\theta _{i}-{\frac {\ell }{m}}\sum _{i\in H_{j}}\theta _{i}\right)\right),

the score function is

\ell ^{\prime }(\beta )=\sum _{j}\left(\sum _{i\in H_{j}}X_{i}-\sum _{\ell =0}^{m-1}{\frac {\sum _{i:Y_{i}\geq t_{j}}\theta _{i}X_{i}-{\frac {\ell }{m}}\sum _{i\in H_{j}}\theta _{i}X_{i}}{\sum _{i:Y_{i}\geq t_{j}}\theta _{i}-{\frac {\ell }{m}}\sum _{i\in H_{j}}\theta _{i}}}\right),

and the Hessian matrix is

\ell ^{\prime \prime }(\beta )=-\sum _{j}\sum _{\ell =0}^{m-1}\left({\frac {\sum _{i:Y_{i}\geq t_{j}}\theta _{i}X_{i}X_{i}^{\prime }-{\frac {\ell }{m}}\sum _{i\in H_{j}}\theta _{i}X_{i}X_{i}^{\prime }}{\phi _{j,\ell ,m}}}-{\frac {Z_{j,\ell ,m}Z_{j,\ell ,m}^{\prime }}{\phi _{j,\ell ,m}^{2}}}\right),

where

\phi _{j,\ell ,m}=\sum _{i:Y_{i}\geq t_{j}}\theta _{i}-{\frac {\ell }{m}}\sum _{i\in H_{j}}\theta _{i}

Z_{j,\ell ,m}=\sum _{i:Y_{i}\geq t_{j}}\theta _{i}X_{i}-{\frac {\ell }{m}}\sum _{i\in H_{j}}\theta _{i}X_{i}.

Note that when H_j is empty (all observations with time t_j are censored), the summands in these expressions are treated as zero.

Time-varying predictors and coefficients

Extensions to time dependent variables, time dependent strata, and multiple events per subject, can be incorporated by the counting process formulation of Andersen and Gill.^[6]

In addition to allowing time-varying covariates (i.e., predictors), the Cox model may be generalized to time-varying coefficients as well. That is, the proportional effect of a treatment may vary with time; e.g. a drug may be very effective if administered within one month of morbidity, and become less effective as time goes on. The hypothesis of no change with time (stationarity) of the coefficient may then be tested. Details and software (R package) are available in Martinussen and Scheike (2006).^[7]^[8] The application of the Cox model with time-varying covariates is considered in reliability mathematics.^[9]

In this context, it could also be mentioned that it is theoretically possible to specify the effect of covariates by using additive hazards,^[10] i.e. specifying

\lambda (t|X_{i})=\lambda _{0}(t)+\beta _{1}X_{i1}+\cdots +\beta _{p}X_{ip}=\lambda _{0}(t)+X_{i}\cdot \beta .

If such additive hazards models are used in situations where (log-)likelihood maximization is the objective, care must be taken to restrict $\lambda (t|X_{i})$ to non-negative values. Perhaps as a result of this complication, such models are seldom seen. If the objective is instead least squares the non-negativity restriction is not strictly required.

Specifying the baseline hazard function

The Cox model may be specialized if a reason exists to assume that the baseline hazard follows a particular form. In this case, the baseline hazard $\lambda _{0}(t)$ is replaced by a given function. For example, assuming the hazard function to be the Weibull hazard function gives the Weibull proportional hazards model.

Incidentally, using the Weibull baseline hazard is the only circumstance under which the model satisfies both the proportional hazards, and accelerated failure time models.

The generic term parametric proportional hazards models can be used to describe proportional hazards models in which the hazard function is specified. The Cox proportional hazards model is sometimes called a semiparametric model by contrast.

Some authors use the term Cox proportional hazards model even when specifying the underlying hazard function,^[11] to acknowledge the debt of the entire field to David Cox.

The term Cox regression model (omitting proportional hazards) is sometimes used to describe the extension of the Cox model to include time-dependent factors. However, this usage is potentially ambiguous since the Cox proportional hazards model can itself be described as a regression model.

Relationship to Poisson models

There is a relationship between proportional hazards models and Poisson regression models which is sometimes used to fit approximate proportional hazards models in software for Poisson regression. The usual reason for doing this is that calculation is much quicker. This was more important in the days of slower computers but can still be useful for particularly large data sets or complex problems. Laird and Olivier (1981)^[12] provide the mathematical details. They note, "we do not assume [the Poisson model] is true, but simply use it as a device for deriving the likelihood." McCullagh and Nelder's^[13] book on generalized linear models has a chapter on converting proportional hazards models to generalized linear models.

Under high-dimensional setup

In high-dimension, when number of covariates p is large compared to the sample size n, the LASSO method is one of the classical model-selection strategies. Tibshirani (1997) has proposed a Lasso procedure for the proportional hazard regression parameter.^[14] The Lasso estimator of the regression parameter β is defined as the minimizer of the opposite of the Cox partial log-likelihood under an L¹-norm type constraint.

\ell (\beta )=\sum _{j}\left(\sum _{i\in H_{j}}X_{i}\cdot \beta -\sum _{\ell =0}^{m-1}\log \left(\sum _{i:Y_{i}\geq t_{j}}\theta _{i}-{\frac {\ell }{m}}\sum _{i\in H_{j}}\theta _{i}\right)\right)+\lambda \|\beta \|_{1},

There has been theoretical progress on this topic recently.^[15]^[16]^[17]^[18]

Notes

↑ Breslow, N. E. (1975). "Analysis of Survival Data under the Proportional Hazards Model". International Statistical Review / Revue Internationale de Statistique. 43 (1): 45–57. doi:10.2307/1402659. JSTOR 1402659.
↑ Cox, David R (1972). "Regression Models and Life-Tables". Journal of the Royal Statistical Society, Series B. 34 (2): 187–220. JSTOR 2985181. MR 0341758
↑ Reid, N. (1994). "A Conversation with Sir David Cox". Statistical Science. 9 (3): 439–455. doi:10.1214/ss/1177010394.
↑ Cox, D. R. (1997). Some remarks on the analysis of survival data. the First Seattle Symposium of Biostatistics: Survival Analysis.
↑ Efron, Bradley (1974). "The Efficiency of Cox's Likelihood Function for Censored Data". Journal of the American Statistical Association. 72 (359): 557–565. doi:10.1080/01621459.1977.10480613. JSTOR 2286217.
↑ Andersen, P.; Gill, R. (1982). "Cox's regression model for counting processes, a large sample study.". Annals of Statistics. 10 (4): 1100–1120. doi:10.1214/aos/1176345976. JSTOR 2240714.
↑ Martinussen; Scheike (2006). Dynamic Regression Models for Survival Data. Springer. doi:10.1007/0-387-33960-4. ISBN 978-0-387-20274-7.
↑ "timereg: Flexible Regression Models for Survival Data". CRAN.
↑ Wu, S.; Scarf, P. (2015). "Decline and repair, and covariate effects". European Journal of Operational Research. 244 (1): 219–226. doi:10.1016/j.ejor.2016.07.052.
↑ Cox, D. R. (1997). Some remarks on the analysis of survival data. the First Seattle Symposium of Biostatistics: Survival Analysis.
↑ Bender, R.; Augustin, T.; Blettner, M. (2006). "Generating survival times to simulate Cox proportional hazards models". Statistics in Medicine. 24: 1713–1723. doi:10.1002/sim.2369.
↑ Nan Laird and Donald Olivier (1981). "Covariance Analysis of Censored Survival Data Using Log-Linear Analysis Techniques". Journal of the American Statistical Association. 76 (374): 231–240. doi:10.2307/2287816. JSTOR 2287816.
↑ P. McCullagh and J. A. Nelder (2000). "Chapter 13: Models for Survival Data". Generalized Linear Models (Second ed.). Boca Raton, Florida: Chapman & Hall/CRC. ISBN 0-412-31760-5. (Second edition 1989; first CRC reprint 1999.)
↑ Tibshirani, R. (1997). "The Lasso method for variable selection in the Cox model". Statistics in Medicine. 16 (4): 385–395. doi:10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3.
↑ Bradić, J.; Fan, J.; Jiang, J. (2011). "Regularization for Cox's proportional hazards model with NP-dimensionality". Annals of Statistics. 39 (6): 3092–3120. doi:10.1214/11-AOS911.
↑ Bradić, J.; Song, R. (2015). "Structured Estimation in Nonparametric Cox Model". Electronic Journal of Statistics. 9 (1): 492–534. doi:10.1214/15-EJS1004.
↑ Kong, S.; Nan, B. (2014). "Non-asymptotic oracle inequalities for the high-dimensional Cox regression via Lasso". Statistica Sinica. 24 (1): 25–42. doi:10.5705/ss.2012.240.
↑ Huang, J.; Sun, T.; Ying, Z.; Yu, Y.; Zhang, C. H. (2011). "Oracle inequalities for the lasso in the Cox model". The Annals of Statistics. 41 (3): 1142–1165. doi:10.1214/13-AOS1098.

References

Bagdonavicius, V.; Levuliene, R.; Nikulin, M. (2010). "Goodness-of-fit Criteria for the Cox model from Left Truncated and Right Censored Data". Journal of Mathematical Sciences. 167 (4): 436–443. doi:10.1007/s10958-010-9929-6.
Cox, D. R.; Oakes, D. (1984). Analysis of Survival Data. New York: Chapman & Hall. ISBN 041224490X.
Collett, D. (2003). Modelling Survival Data in Medical Research (2nd ed.). Boca Raton: CRC. ISBN 1584883251.
Gouriéroux, Christian (2000). "Duration Models". Econometrics of Qualitative Dependent Variables. New York: Cambridge University Press. pp. 284–362. ISBN 0-521-58985-1.
Singer, Judith D.; Willett, John B. (2003). "Fitting Cox Regression Models". Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. New York: Oxford University Press. pp. 503–542. ISBN 0-19-515296-4.
Therneau, T. M.; Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model. New York: Springer. ISBN 0387987843.

Statistics

Descriptive statistics

Continuous data

Center	Mean arithmetic geometric harmonic Median Mode

Dispersion	Variance Standard deviation Coefficient of variation Percentile Range Interquartile range

Shape	Moments Skewness Kurtosis L-moments

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Population Statistic Effect size Statistical power Sample size determination Missing data

Survey methodology	Sampling Standard error stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in

Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife

Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F

Goodness of fit	Chi-squared Kolmogorov–Smirnov Anderson–Darling Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC

Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia - version of the 11/18/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.