\batchmode
\makeatletter
\def\input@path{{/home/pauljohn/SVN/SVN-guides/stat/Distributions/DistributionWriteups/NegativeBinomial//}}
\makeatother
\documentclass[english]{article}
\usepackage[T1]{fontenc}
\usepackage[latin9]{inputenc}
\usepackage{geometry}
\geometry{verbose,tmargin=1in,bmargin=1in,lmargin=1in,rmargin=1in}
\usepackage{url}
\usepackage{setspace}
\makeatletter
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Textclass specific LaTeX commands.
\usepackage{Sweavel}
<>=
if(exists(".orig.enc")) options(encoding = .orig.enc)
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands.
\usepackage{lmodern}
\usepackage{Sweavel}
\usepackage{graphicx}
\usepackage{color}
\usepackage[samesize]{cancel}
\usepackage{ifthen}
\makeatletter
\renewenvironment{figure}[1][]{%
\ifthenelse{\equal{#1}{}}{%
\@float{figure}
}{%
\@float{figure}[#1]%
}%
\centering
}{%
\end@float
}
\renewenvironment{table}[1][]{%
\ifthenelse{\equal{#1}{}}{%
\@float{table}
}{%
\@float{table}[#1]%
}%
\centering
% \setlength{\@tempdima}{\abovecaptionskip}%
% \setlength{\abovecaptionskip}{\belowcaptionskip}%
% \setlength{\belowcaptionskip}{\@tempdima}%
}{%
\end@float
}
%\usepackage{listings}
\lstset{tabsize=2, breaklines=true,style=Rstyle}
% In document Latex options:
\fvset{listparameters={\setlength{\topsep}{0em}}}
\def\Sweavesize{\normalsize}
\def\Rcolor{\color{black}}
\def\Rbackground{\color[gray]{0.95}}
\makeatother
\usepackage{babel}
\begin{document}
<>=
unlink("plots", recursive=T)
dir.create("plots", showWarnings=F)
@
% In document Latex options:
\fvset{listparameters={\setlength{\topsep}{0em}}}
\SweaveOpts{prefix.string=plots/t,split=T,ae=F,height=6,width=6}
<>=
options(width=100, prompt=" ", continue=" ")
options(useFancyQuotes = FALSE)
set.seed(12345)
op <- par()
options(SweaveHooks=list(fig=function() par(ps=12)))
pdf.options(onefile=F,family="Times",pointsize=12)
@
\title{Negative Binomial}
\author{Paul Johnson and Andrea Vieux}
\date{June 10, 2013}
\maketitle
\section{Introduction}
The Negative Binomial is discrete probability distribution, one that
can be used for counts or other integer-valued variables. It gives
probabilities for integer values from $0$ to $\infty$. It is sometimes
also known as the Pascal distribution or as Polya's distribution.
The Negative Binomial is best thought of as a variant of a Binomial
distribution. The reader will recall that a \emph{Bernoulli trial}
is a ``coin flip'' experiment, one that returns ``yes'' or ``no'',
``failure'' or ``success.'' The Binomial distribution describes
the number of ``successes'' out of a given number of ``Bernoulli
trials''. As such, its values are defined from $0$ up to the number
of trials.
The Negative Binomial describes the same sequence of Bernoulli trials
that is described by the Binomial distribution. The difference is
``what'' is being described. Whereas Bernoulli tracks the number
of successes, the Negative Binomial represents the number of failures
that occurs at the beginning of the sequence as we wait for a given
number of successes to be achieved.
The most frequent use of the Negative Binomial model has nothing to
do with the property that it describes the chances of a string of
failures. Rather, it is used to describe distributions that represent
counts of events. It is a replacement for the commonly used Poisson
distribution, which is often considered the default distribution for
integer-valued count data. The Poisson has the property that its expected
value equals its variance, a property that is not found in some empirically
observed distributions. The Negative Binomial has a higher variance.
\section{Mathematical Description}
Recall that the Binomial distribution gives the probability of observing
$r$ successes out ot $N$ trials when the probability of observing
a success is fixed at $p$.
\begin{equation}
B(r|N)=\left(\begin{array}{c}
N\\
r
\end{array}\right)p^{r}(1-p)^{N-r}=\frac{N!}{r!(N-r)!}p^{r}(1-p)^{N-r}\label{eq:Binomial}
\end{equation}
\\
The symbol $\left(\begin{array}{c}
N\\
r
\end{array}\right)$ is the ``binomial coefficient'' (from intermediate algebra) and
it is spoken out loud as ``$N$ choose $r$'', meaning the number
of ways one can choose subsets of $r$ from a larger set of $N$.
\begin{equation}
\left(\begin{array}{c}
N\\
r
\end{array}\right)=\frac{N!}{N!(N-r)!}\label{eq:BinCoef}
\end{equation}
\\
Thinking of one particular sequence the $N$ Bernoulli trials, one
that has $r$ successes, we can easily calculate the probabilities.
The chance of observing one particular sequence, $S,S,F,S,F,\ldots S$,
will be $p\cdot p\cdot(1-p)\cdot p\cdot(1-p)\ldots p$. Regrouping
indicates the probability of that particular sequence is $p^{r}(1-p)^{N-r}$
. That is the probability of one particular sequence that has $r$
successes. Put together the chances of all such sequences, of which
there are $\left(\begin{array}{c}
N\\
r
\end{array}\right)$, and the formula in \ref{eq:Binomial} should be clearly understood.
The Negative Binomial is defined with reference to the Binomial. The
R help page for the rbinom function describes it as ``the number
of failures which occur in a sequence of Bernoulli trials before a
target number of successes is reached'' (R documentation).
We found the description on the Math World web site (\url{http://mathworld.wolfram.com/NegativeBinomialDistribution.html})
to have more intuitive appeal. The Negative Binomial Distribution
gives ``the probability of r-1 successes and x failures in x+r-1
trials, and success on the (x+r) th trial.''
Take the first part of the definition, $(r-1)$ successes out of $(x+r-1)$
trials. By the Binomial distribution, that is
\begin{equation}
Prob(r-1\mid x+r-1)=\frac{(x+r-1)!}{(r-1)!(x)!}p^{r-1}(1-p)^{x}\label{eq:Binr-1}
\end{equation}
\\
The probability of observing a success on the last trial, the $(x+r)$th
trial, is $p$. So, by the multiplication rule of probabilities, we
multiply the previous equation \ref{eq:Binr-1} by $p$ to obtain
\begin{equation}
NB(x\mid r,p)=\frac{(x+r-1)!}{(r-1)!(x)!}p^{r}(1-p)^{x}\label{eq:Binr-2}
\end{equation}
\\
Or, if you prefer to write it with the binomial coefficient,
\begin{equation}
NB(x\mid r,p)=\left(\begin{array}{c}
x+r-1\\
r-1
\end{array}\right)p^{r}(1-p)^{x}\label{eq:Binr-3}
\end{equation}
\section{Central Tendency and Dispersion}
If $x$ has a Negative Binomial distribution, then $x$ can take on
integer values ranging from $0$ to $\infty$. The upper limit is
infinity because we are letting the process run until there are exactly
$(r-1)$ failures and then we conduct one additional experiment on
which we obtain success.
The expected value and variance are:
\begin{equation}
E[x]=\frac{r(1-p)}{p}
\end{equation}
\begin{equation}
Var(x)=\frac{r(1-p)}{p^{2}}
\end{equation}
\\
and the skewness and kurtosis are
\[
Skewness(x)=\frac{2-p}{\sqrt{(1-p)r}}
\]
\[
Kurtosis(x)=\frac{p^{2}-6p+6}{r(1-p)}
\]
These are worth considering because of their dependence on $r$. As
one would expect, the expected number of successes is increasing in
the number of observed failures. However, the important effects of
$r$ are seen in the variance and skewness. As $r$ increases, the
variance increases. But the skewness shrinks. Hence, for small values
of $r$, we should expect to see a very skewed distribution, whereas
for higher values of $r$, the distribution should be more symmetrical.
\section{Illustrations}
The following code is for Figure \ref{cap:Negative-Binomial}, which
displays example distributions as a function of a target number of
successes and probabilities for successes on individual trials. Recall
that the probabilities indicate the chances of observing a given number
of failures after observing a given number of successes.
<>=
drawBinom <- function (S, p){
x<- seq(0,200)
xprob <- dnbinom(x,size=S,prob=p)
mytitle <- paste("x",S,p)
plot(x, xprob,type="s",main=mytitle,xlab=paste("number of failures before ",S,"successes"))
}
par(mfcol=c(3,2))
sizes <- c(20,40,60)
pvals <- c(0.33,0.67)
for (i in 1:2){
for (j in 1:3){
drawBinom (sizes[j], pvals[i] )
}
}
@
\begin{figure}
\caption{Negative Binomial\label{cap:Negative-Binomial}}
<>=
<>
@
\end{figure}
\begin{doublespace}
In the Figure \ref{cap:Negative-Binomial} we can see how changes
in the size and probability alter the Negative Binomial Distribution.
The size moves the graph right to left, and the probability widens
or shrinks the bell shape. For a graph which resembles the Normal
Distribution, refer to Figure \ref{cap:Negative-Binomial-2}.
\end{doublespace}
\begin{figure}
\caption{Negative Binomial 2\label{cap:Negative-Binomial-2}}
<>=
x7 <- seq(0, 100)
xprob7 <- dnbinom(x7, size = 50, prob = 0.5)
plot(xprob7, type="s")
@
\end{figure}
\section{The Negative Binomial as a Mixture Distribution}
From J. Scott Long's book (Citation ??):
\begin{doublespace}
The negative binomial distribution is a useful tool in the analysis
of count data. It differs from the Poisson Distribution in that it
allows for the conditional variance to exceed the conditional mean.
The negative binomial distribution is the end result of a process
which begins with a Poisson distribution with mean $\lambda$. The
fixed ($\lambda$) is replaced with a random variable that has a mean
of $\lambda$, but has nonzero variance. With this new framework,
we are representing the possibility that there is unobserved heterogeneity.
\end{doublespace}
Recall the definition of the Poisson gives the probability of observing
$x$ events as a function of a parameter $\lambda$.
\begin{equation}
Poisson(x|\lambda)=\frac{\lambda^{x}e^{-\lambda}}{x!}\label{eq:ConditPoisson}
\end{equation}
Instead of thinking of $\lambda$ as a constant, think of it as a
random variable, representing the fact that the average count is likely
to vary across experiments (counties, cities, etc). If we think of
$\lambda$ as a variable, then the probability of observing $x$ failures
has to be the result of a 2 part process. Part 1 chooses $\lambda$
and part 2 chooses $x$. That means there are possibly many different
routes through which we can observe a particular value $x$, since
$\lambda$ can vary from $0$ to $\infty$, and for each possible
value of $\lambda$, there is ``some chance'' of observing any positive
value.
\begin{equation}
\int P(x|\lambda)p(\lambda)d\lambda\label{eq:mixture}
\end{equation}
If one chooses ``just the right'' distribution for $\lambda$, then
the ``mixture'' distribution will be Negative Binomial.
Suppose we take the lucky guess that $\lambda$ has a Gamma probability
distribution
\begin{equation}
Prob(\lambda)=\frac{1}{\beta^{\alpha}\Gamma(\alpha)}\lambda^{(\alpha-1)}e^{-\lambda/\beta}
\end{equation}
\\
The ``shape'' parameter is $\alpha$ and the scale parameter is
$\beta$, and the expected value is $\alpha\beta$ and the variance
is $\alpha\beta^{2}$.
Inserting the given distributions into \ref{eq:mixture}, we are able
to calculate the marginal distribution of $x$.
\begin{equation}
Prob(x)=\int_{0}^{\infty}\frac{e^{-\lambda}\lambda^{x}}{x!}\frac{1}{\beta^{\alpha}\Gamma(\alpha)}\lambda^{(\alpha-1)}e^{-\lambda/\beta}d\lambda
\end{equation}
\begin{equation}
=\frac{1}{x!\beta^{\alpha}\Gamma(\alpha)}\int_{0}^{\infty}e^{-\lambda}\lambda^{x}\lambda^{(\alpha-1)}e^{-\lambda/\beta}d\lambda
\end{equation}
\begin{equation}
=\frac{1}{x!\beta^{\alpha}\Gamma(\alpha)}\int_{0}^{\infty}\lambda^{x+\alpha-1}e^{-\lambda}e^{-\lambda/\beta}d\lambda
\end{equation}
\begin{equation}
=\frac{1}{x!\beta^{\alpha}\Gamma(\alpha)}\int_{0}^{\infty}\lambda^{x+\alpha-1}e^{-\lambda(1+1/\beta)}d\lambda\label{eq:brute4}
\end{equation}
I have tried to simplify this through brute force, but have been unable
to do so. I feel sure there is a specialized result in calculus that
will help to use the Gamma function, which would be adapted to this
case as:
\begin{equation}
\Gamma(x+\alpha)=\int_{0}^{\infty}\lambda^{x+\alpha-1}e^{-\lambda}d\lambda
\end{equation}
\\
And the end result, the Negative Binomial distribution, is stated
either as
\begin{equation}
=\frac{\Gamma(\alpha+x)}{x!\Gamma(\alpha)}\left(\frac{1}{1+\beta}\right)^{\alpha}\left(\frac{\beta}{1+\beta}\right)^{x}
\end{equation}
\\
or
\begin{equation}
=\frac{\Gamma(\alpha+x)}{x!\Gamma(\alpha)}\left(\frac{1}{1+\beta}\right)^{\alpha}\left(1-\frac{1}{1+\beta}\right)^{x}\label{eq:mixture6}
\end{equation}
This is the form of the Negative Binomial that was discussed earlier.
I have been unable to find the brute force method to make the jump
from \ref{eq:brute4} to \ref{eq:mixture6}, but in Gelman, Carlin,
Stern, and Rubin's Bayesian Data Analysis, 2ed (p. 53), an alternative
derivation is offered. Recall Bayes's theorem
\begin{equation}
p(\lambda|x)=\frac{p(x|\lambda)p(\lambda)}{p(x)}\label{eq:Bayes1}
\end{equation}
\\
Rearrange that
\begin{equation}
p(x)=\frac{p(x|\lambda)p(\lambda)}{p(\lambda|x)}\label{eq:Bayes2}
\end{equation}
\\
The left hand side is our target--the marginal distribution of $x$.
We already know formulae for the two components in the numerator,
but the denominator requires a result from probability theory.
Most statistics books--and all Bayesian statistics books--will have
a result about updating from a Poisson distribution. If $x$ is a
Poisson variable with parameter $\lambda$, and the conjugate prior
for $\lambda$is the Gamma distribution, $Gamma(\lambda|\alpha,\beta)$
, then Bayes's theorem (expression \ref{eq:Bayes1}), leads to the
conclusion that if $x$ ``events'' are observed, then
\begin{equation}
p(\lambda|x)=Gamma(\lambda|\alpha+x,\beta+1)
\end{equation}
\\
That is the last ingredient we need to solve expression \ref{eq:Bayes2}.
Gelman et al put it this way:
\begin{equation}
p(x)=\frac{Poisson(x|\lambda)Gamma(\lambda|\alpha,\beta)}{Gamma(\lambda|\alpha+x,1|\beta)}
\end{equation}
\\
Writing in the expressions obtained above,
\begin{equation}
p(x)=\frac{\frac{\lambda^{x}e^{-\lambda}}{x!}\cdot\frac{1}{\beta^{\alpha}\Gamma(\alpha)}\lambda^{(\alpha-1)}e^{-\lambda/\beta}}{\frac{1}{(1+\beta)^{(\alpha+x)}\Gamma(\alpha+x)}\lambda^{(\alpha+x-1)}e^{-\frac{\lambda}{(1+\beta)}}}
\end{equation}
\\
After cancellation and rearrangement, that simplifies to:
\begin{equation}
p(x)=\frac{\Gamma(\alpha+x)}{\Gamma(\alpha)x!}\cdot\frac{\beta^{\alpha}}{(1+\beta)^{\alpha+x}}
\end{equation}
\\
Using the definition of $\Gamma(x)$ and rearranging again, we find:
\begin{equation}
=\frac{(\alpha+x-1)!}{(\alpha-1)!x!}\cdot\left(\frac{\beta}{1+\beta}\right)^{\alpha}\left(\frac{1}{1+\beta}\right)^{x}
\end{equation}
\\
Recognizing that the first term on the left is the Binomial coefficient,
the derivation is complete.
We choose the shape and scale parameters so that the Gamma distributed
value of $\lambda$ will allow us to simplify\ref{eq:mixture6}. It
appears obvious in this expression that the role of the probability
of success on an individual trial is $p=\frac{1}{1+\beta}$ and that
$\alpha$ is the ``desired number of successes'' in the Negative
Binomial. That means that, if want to set the Gamma parameters equal
to the proper levels, we choose $\beta=(1-p)/p$ and $\alpha=n$ (the
number of desired successes).
If $n$ represents the number of successes and the number of failures
is $x=N-n$
\[
Prob(x)=\frac{\Gamma(x+n)}{\Gamma(n)x!}p^{n}(1-p)^{x}
\]
\subsection*{Another way to think of the mixture}
Instead of replacing $\lambda$ in the Poisson definition, suppose
we think of multiplying it by a randomly chosen value $\delta$. If
$\delta$ has an expected value of $1$, then $E(\lambda\delta)=\lambda$,
so, ``on average,'' the original $\lambda$ is reproduced. However,
because $\delta$ might be higher or lower, then $\lambda\delta$
will have random variation, and so the number of observed events will
fluctuate. Its average will still be $\lambda,$ but it will have
greater variance. In such a way, one can see the Poisson in this way
(replacing $\lambda$ by $\lambda\delta$).
\[
P(x\mid\delta)=\frac{e^{-(\lambda\delta)}(\lambda\delta)^{x}}{x!}
\]
Here is the question: how can one select a mathematically workable
model for $\delta$ so that $E(\delta)=1$?
I've seen this done in several ways. Recall that the Gamma distribution
has expected value $\alpha\beta$ and variance $\alpha\beta^{2}$.
So if we drew a variate $y$ from any Gamma distribution and then
divided by $\alpha\beta,$ the result would be
\[
x/\alpha\beta
\]
\\
and the expected value would be $E(x/\alpha\beta)=E(y)/\alpha\beta=1$.
More commonly, attention is focused on a subset of possible Gamma
distributions, the ones for which the $\beta=1/\alpha$. When $\delta$
follows a Gamma distribution $Gamma(\delta|\alpha,1/\alpha)$, then
it has an expected value of 1 and its variance is $1/\alpha$. As
$\alpha$ tends to 0, the variance tends toward infinity.
Think of $\lambda$ as an ``attribute'' of an observation. If $\delta$
is a Gamma variate with mean 1 and variance $1/\alpha$, then $\lambda\delta$
is also a Gamma variate, with mean $\lambda$ and variance $\lambda^{2}/\alpha$.
Maybe it is just a lucky guess, but it appears to me that $\lambda\delta$
would have the distribution $Gamma(\alpha,\lambda/\alpha)$.
\end{document}