#LyX 2.0 created this file. For more info see http://www.lyx.org/
\lyxformat 413
\begin_document
\begin_header
\textclass article
\begin_preamble
\usepackage{ragged2e}
\RaggedRight
\setlength{\parindent}{20pt}
\end_preamble
\use_default_options false
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman times
\font_sans helvet
\font_typewriter courier
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize 12
\spacing single
\use_hyperref false
\papersize default
\use_geometry true
\use_amsmath 1
\use_esint 0
\use_mhchem 0
\use_mathdots 1
\cite_engine basic
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\use_refstyle 0
\index Index
\shortcut idx
\color #008000
\end_index
\leftmargin 1in
\topmargin 1in
\rightmargin 1in
\bottommargin 1in
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header
\begin_body
\begin_layout Title
Duration Models #1 v.4
\end_layout
\begin_layout Author
Paul Johnson
\end_layout
\begin_layout Section
Let's get some math out of the way
\end_layout
\begin_layout Standard
All duration--or
\begin_inset Quotes eld
\end_inset
hazard
\begin_inset Quotes erd
\end_inset
--models are based on the idea that, as time passes, the object under study
can undergo an
\begin_inset Quotes eld
\end_inset
event
\begin_inset Quotes erd
\end_inset
.
The event might be death, quitting, getting fired, etc.
At a given time, the probability of an event is defined as
\begin_inset Formula $f(t)$
\end_inset
.
\end_layout
\begin_layout Subsection
Event probability
\end_layout
\begin_layout Standard
If the probability of an event is constant, then the model would look like
this:
\end_layout
\begin_layout Standard
\begin_inset VSpace 0.3cm
\end_inset
\end_layout
\begin_layout Standard
\align center
\begin_inset Graphics
filename importfigs/figure1.eps
\end_inset
\end_layout
\begin_layout Standard
\begin_inset VSpace 0.3cm
\end_inset
\end_layout
\begin_layout Standard
That is a
\begin_inset Quotes eld
\end_inset
probability density function,
\begin_inset Quotes erd
\end_inset
it gives probability over an interval from 0 to end.
\end_layout
\begin_layout Standard
The
\begin_inset Quotes eld
\end_inset
cumulative distribution function
\begin_inset Quotes erd
\end_inset
, more commonly called just the
\series bold
distribution function
\series default
,
\begin_inset Formula $F(t)$
\end_inset
, tells the probability that an observation will have an event time
\begin_inset Quotes eld
\end_inset
T
\begin_inset Quotes erd
\end_inset
less than a given critical number
\begin_inset Quotes eld
\end_inset
t:
\end_layout
\begin_layout Standard
\begin_inset VSpace 0.3cm
\end_inset
\end_layout
\begin_layout Standard
\align center
\begin_inset Graphics
filename importfigs/flatf.eps
\end_inset
\end_layout
\begin_layout Standard
\begin_inset VSpace 0.3cm
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
F(t)=Prob(Tt)=1-F(t)\label{eq:Survivor1}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
Note that by definition,
\begin_inset Formula
\begin{equation}
\frac{\partial S(t)}{\partial t}=-\frac{\partial F(t)}{\partial t}=-f(t)\label{eq:Survivor2}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
And, because
\begin_inset Formula $dln(f)/dx=\frac{1}{f(x)}\frac{df}{dx}$
\end_inset
, it is also the case that
\begin_inset Formula
\begin{equation}
\frac{\partial lnS(t)}{\partial t}=\frac{1}{S(t)}\frac{\partial S(t)}{\partial t}\label{eq:logSurvivor1}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
and if you connect the dots between these, you see
\begin_inset Formula
\begin{equation}
\frac{\partial lnS(t)}{\partial t}=-\frac{f(t)}{S(t)}\label{eq:logSurvivor2}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
The change in the survivor function is the negative of the hazard function,
which is defined next.
\end_layout
\begin_layout Subsection
Hazard function
\end_layout
\begin_layout Standard
Now, suppose I told you that an individual had survived until time
\begin_inset Formula $m_{1}$
\end_inset
(m is for milestone).
Take a look at the next picture and tell me what you can infer from it:
\end_layout
\begin_layout Standard
\begin_inset VSpace 0.3cm
\end_inset
\end_layout
\begin_layout Standard
\align center
\begin_inset Graphics
filename importfigs/mflatF.eps
\end_inset
\end_layout
\begin_layout Standard
\begin_inset VSpace 0.3cm
\end_inset
\end_layout
\begin_layout Standard
If I told you somebody lived until
\begin_inset Formula $m_{1}$
\end_inset
, then you could make a more accurate statement about their probability
of exit at
\begin_inset Formula $m_{1}$
\end_inset
than you could before I told you that.
If I did not give you that information, you could only say their probability
of exit is
\begin_inset Formula $f(m_{1}).$
\end_inset
But now, you know more! You know the probability they will exit at that
time has to use
\begin_inset Formula $S(m_{1})$
\end_inset
as a denominator, since that is the total amount of
\begin_inset Quotes eld
\end_inset
risk
\begin_inset Quotes erd
\end_inset
left after reaching time
\begin_inset Formula $m_{1}$
\end_inset
.
\end_layout
\begin_layout Standard
Given a unit lasts until
\begin_inset Formula $t$
\end_inset
, then, the probability that unit will
\begin_inset Quotes eld
\end_inset
exit
\begin_inset Quotes erd
\end_inset
at that time is
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
h(t)=\frac{f(t)}{S(t)}\label{eq:hazard1}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
Harrell uses
\begin_inset Formula $\lambda(t)$
\end_inset
instead of
\begin_inset Formula $h(t)$
\end_inset
in his notation.
Hazard is sometimes given exciting names, like the
\emph on
instantaneous event
\emph default
or
\emph on
instantaneous failure rate.
\emph default
\end_layout
\begin_layout Standard
In light of equation
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:logSurvivor2"
\end_inset
, it is also true that:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
h(t)=-\frac{\partial ln\, S(t)}{\partial t}\label{eq:hazard2}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
I tend to think of this as a pure probability exercise.
The total probability of an exit at or after
\begin_inset Formula $t$
\end_inset
is
\begin_inset Formula $S(t)$
\end_inset
and the unconditional chance of exiting at
\begin_inset Formula $t$
\end_inset
is
\begin_inset Formula $f(t)$
\end_inset
, so the conditional chance of exiting is
\begin_inset Formula $h(t).$
\end_inset
\end_layout
\begin_layout Standard
If you want a formal proof of equation (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:hazard1"
\end_inset
), consult Harrell, p.
395.
Hazard is the probability of an
\begin_inset Quotes eld
\end_inset
event
\begin_inset Quotes erd
\end_inset
that occurs at time T will happen in a really small unit of time between
\begin_inset Formula $t$
\end_inset
and
\begin_inset Formula $t+\Delta t$
\end_inset
, divided by the probability that the event happens after
\begin_inset Formula $t$
\end_inset
:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
h(t)=\frac{lim_{\Delta t\rightarrow0}\frac{Prob(t1$
\end_inset
then the hazard increases from 0 to
\begin_inset Formula $\lambda$
\end_inset
as time goes from 0 to
\begin_inset Formula $\infty$
\end_inset
.
\end_layout
\begin_layout Subsection
Nonparametric survival curves: K-M curves
\end_layout
\begin_layout Standard
I'll write something here someday.
\end_layout
\begin_layout Section
Proportional Hazards: finally some input variables
\end_layout
\begin_layout Standard
Donald Cox made the proportional hazards model famous.
He pioneered many areas in modern statistics.
The
\begin_inset Quotes eld
\end_inset
Cox Proportional Hazards
\begin_inset Quotes erd
\end_inset
model is a specific strategy for estimating regression coefficients.
\end_layout
\begin_layout Standard
Caution: Not all proportional hazards models are CPH models.
\end_layout
\begin_layout Subsection
General Proportional Hazard
\end_layout
\begin_layout Standard
The proportional hazards model assumes that the
\begin_inset Quotes eld
\end_inset
time dependent
\begin_inset Quotes erd
\end_inset
hazard is multiplied by the part that depends on the input variables:
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
h(t)=\lambda(t|X_{i})=\lambda_{0}(t)\cdot f(X_{i},b)
\]
\end_inset
The hazard is a function of time which reflects a
\begin_inset Quotes eld
\end_inset
baseline hazard
\begin_inset Quotes erd
\end_inset
\begin_inset Formula $\lambda_{0}(t)$
\end_inset
multiplied against a function of the input variables.
\end_layout
\begin_layout Standard
Please note the significance.
\end_layout
\begin_layout Quote
\series bold
Hazard separates into two parts
\series default
, an individual dependent part and a part that depends only on time.
\end_layout
\begin_layout Standard
I like this notation
\begin_inset Formula $\lambda(t|X)$
\end_inset
as a way of remembering that hazard is separate from the function
\begin_inset Formula $\lambda_{0}(t)$
\end_inset
.
It is very common, but not absolutely necessary, to assume
\begin_inset Formula
\[
f(x,b)=exp\left(Xb\right)
\]
\end_inset
\begin_inset Newline newline
\end_inset
So the general definition of a proportional hazards model is
\begin_inset Formula
\begin{equation}
h(t)=\lambda_{0}(t)\cdot exp(Xb)\label{eq:propor1}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Fiddle that around in various ways:
\begin_inset Formula
\[
h(t)=e^{ln\lambda_{0}(t)+Xb}
\]
\end_inset
\begin_inset Newline newline
\end_inset
So you can think of the hazard rate as being the exponentiated sum of the
logged time-related element and the input variables.
In other words, if you did have the hazard value as an observed quantity,
and you wanted to use it in a regression model, you would need logged time
as an input.
\end_layout
\begin_layout Standard
To refer to a case i, we might write
\begin_inset Formula
\begin{equation}
h_{i}(t)=\lambda_{0}(t)\cdot exp(X_{i}b)\label{eq:propor2}
\end{equation}
\end_inset
Note the premise here is that all cases at time
\begin_inset Formula $t$
\end_inset
share a certain amount of hazard,
\begin_inset Formula $\lambda_{0}(t)$
\end_inset
and the there is case-specific customization with
\begin_inset Formula $exp(X_{i}b)$
\end_inset
.
\end_layout
\begin_layout Section
The Cox Proportional Hazards model: nonParametric approach.
\end_layout
\begin_layout Standard
There is a division of opinion on this.
Box-Steffensmeier and Jones cite authorities who discourage the use of
the CPH model, whereas Harrell seems to be more enthusiastic.
Take your pick, apply your diagnostics.
\end_layout
\begin_layout Standard
Some people call this a nonparametric approach because we don't end up estimatin
g the baseline hazard at all.
In fact, we don't even end up using the precise times to event data.
We only use the ranking of the event times.
But I prefer to say it is semi parametric, because we do estimate the
\begin_inset Formula $b$
\end_inset
's.
\end_layout
\begin_layout Subsection
About conditional probability
\end_layout
\begin_layout Standard
Suppose the following events might happen: {A,B,C,D,E,F}.
Suppose these are independent events, and the probability of each one is
given by P(A), P(B), ..., P(F), and, of course, these sum up to 1.0.
\end_layout
\begin_layout Standard
Suppose you are told that either A, B, or C happened.
What is the probability that the thing which happened was A?
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
P(A|A\, or\, B\, or\, C)=\frac{P(A)}{P(A)+P(B)+P(C)}\label{eq:Conditional}
\end{equation}
\end_inset
\end_layout
\begin_layout Subsection
Cox's model depends on ordering and conditional probability
\end_layout
\begin_layout Standard
Cox's argument was that we should not worry about the function
\begin_inset Formula $\lambda_{0}(t).$
\end_inset
Its not our main focus.
We want regression coefficients!
\end_layout
\begin_layout Standard
So, how can we make
\begin_inset Formula $\lambda_{0}(t)$
\end_inset
disappear? The event time for a case
\begin_inset Formula $i$
\end_inset
is
\begin_inset Formula $T_{i}$
\end_inset
and the
\begin_inset Formula $risk\, Set$
\end_inset
at time
\begin_inset Formula $t$
\end_inset
is the set of observations (
\begin_inset Formula $j\in risk\, Set)$
\end_inset
such that the event did not occur, meaning
\begin_inset Formula $T_{j}>t$
\end_inset
.
Let's figure out the probability that observation 1 will be the first one
to have an event.
Observe (following the logic in
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:Conditional"
\end_inset
):
\begin_inset Formula
\begin{equation}
\frac{h_{1}(t)}{\sum_{i\in risk\, Set}h_{i}(t)}=\frac{\lambda_{0}(t)\cdot exp(X_{1}b)}{\sum_{i\in risk\, Set}\lambda_{0}(t)\cdot exp(X_{i}b)}=\frac{exp(X_{1}b)}{\sum_{t