#LyX 2.0 created this file. For more info see http://www.lyx.org/
\lyxformat 413
\begin_document
\begin_header
\textclass rliteratearticle
\begin_preamble
\usepackage{Sweavel}
\usepackage{graphicx}
\usepackage{color}
\def\Sweavesize{\normalsize}
\def\Rcolor{\color{black}}
\def\Rbackground{\color[gray]{0.95}}
%%Centers contents of figures and tables
\usepackage{ifthen}
\renewenvironment{figure}[1][]{%
\ifthenelse{\equal{#1}{}}{%
\@float{figure}
}{%
\@float{figure}[#1]%
}%
\centering
}{%
\end@float
}
\renewenvironment{table}[1][]{%
\ifthenelse{\equal{#1}{}}{%
\@float{table}
}{%
\@float{table}[#1]%
}%
\centering
}{%
\end@float
}
\end_preamble
\use_default_options false
\begin_modules
sweave
\end_modules
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman lmodern
\font_sans lmss
\font_typewriter lmtt
\font_default_family rmdefault
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\float_placement H
\paperfontsize 12
\spacing single
\use_hyperref false
\papersize default
\use_geometry true
\use_amsmath 1
\use_esint 0
\use_mhchem 1
\use_mathdots 1
\cite_engine basic
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\use_refstyle 0
\branch R
\selected 1
\filename_suffix 0
\color #faf0e6
\end_branch
\index Index
\shortcut idx
\color #008000
\end_index
\leftmargin 1in
\topmargin 1in
\rightmargin 1in
\bottommargin 1in
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header
\begin_body
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
unlink("plots", recursive=T)
\end_layout
\begin_layout Plain Layout
unlink("DistributionReview.pdf")
\end_layout
\begin_layout Plain Layout
dir.create("plots", showWarnings=F)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
% In document Latex options:
\end_layout
\begin_layout Plain Layout
\backslash
fvset{listparameters={
\backslash
setlength{
\backslash
topsep}{0em}}}
\end_layout
\begin_layout Plain Layout
\backslash
SweaveOpts{prefix.string=plots/t,split=T,ae=F,height=3,width=5}
\end_layout
\begin_layout Plain Layout
\backslash
def
\backslash
Sweavesize{
\backslash
normalsize}
\end_layout
\begin_layout Plain Layout
\backslash
def
\backslash
Rcolor{
\backslash
color{black}}
\end_layout
\begin_layout Plain Layout
\backslash
def
\backslash
Rbackground{
\backslash
color[gray]{0.95}}
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
options(width=100, prompt=" ", continue=" ")
\end_layout
\begin_layout Plain Layout
options(useFancyQuotes = FALSE)
\end_layout
\begin_layout Plain Layout
set.seed(12345)
\end_layout
\begin_layout Plain Layout
op < par()
\end_layout
\begin_layout Plain Layout
#pjmar < c(5.1, 4.1, 1.0, 2.1)
\end_layout
\begin_layout Plain Layout
#options(SweaveHooks=list(fig=function() par(mar=pjmar, ps=10)))
\end_layout
\begin_layout Plain Layout
options(SweaveHooks=list(fig=function() par(ps=10)))
\end_layout
\begin_layout Plain Layout
pdf.options(onefile=F,family="Times",pointsize=10)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\begin_layout Title
Distribution Overview: Probability by the Seat of the Pants
\end_layout
\begin_layout Author
Paul Johnson
\end_layout
\begin_layout Section
Why Do We Need Probability Concepts?
\end_layout
\begin_layout Standard
I'll start by challenging you.
\end_layout
\begin_layout Enumerate
Describe the range of test scores you expect when we test my students in
U.S.
politics?
\end_layout
\begin_layout Enumerate
Describe the number of times per month that your neighbor's dog will wiggle
under the fence and escape.
\end_layout
\begin_layout Standard
Most people I know agree that answers like the following are, more or less,
acceptable.
\end_layout
\begin_layout Enumerate
The test scores range from 0 to 100, the proportion of students who earn
each score will be something like this:
\end_layout
\begin_deeper
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < 50:100
\end_layout
\begin_layout Plain Layout
y < (dnorm(x, m=70, sd=5) + dnorm(x, m=90, sd=7))/2
\end_layout
\begin_layout Plain Layout
plot(x, y, type="l", lty=4, xlab="Test Scores", ylab="Chance of Observing
Each Score")
\end_layout
\begin_layout Plain Layout
@
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tintro10}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
A Probability Distribution of Test Scores
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
I have to be a little bit careful in describing this figure.
The curve represents probabilities, but what does that mean? At the current
time, my best answer is this.
Draw one person's name from a hat (all names equally likely).
Without any additional information, the curve tells me that the most likely
score is approximately 70.
Scores below 70 are less likely, but if we consider 80 and above, the most
likely score is 90.
These beliefs reflect my experience as a teacher.
My class tends to have one big clump of solid C students and an smaller,
well defined clump of students for whom the average is A.
When those two groups are combined, a
\begin_inset Quotes eld
\end_inset
two humped camel
\begin_inset Quotes erd
\end_inset
graph emerges.
\end_layout
\end_deeper
\begin_layout Enumerate
The dog will probably not escape at all, but there is a decent chance it
will escape once, and lesser still is the chance of 2 escapes, and the
chance is lower for each successive number of escapes.
\end_layout
\begin_deeper
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < 0:10
\end_layout
\begin_layout Plain Layout
y < dpois(x, lambda=0.75)
\end_layout
\begin_layout Plain Layout
plot(x,y, type="s", lty=4, xlab="Dog Escapes", ylab="Chance of Observing
Each Score")
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tintro20}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
The Chances of a Given Number of Dog Escapes
\begin_inset CommandInset label
LatexCommand label
name "fig:PoissonDogEscapes"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\end_deeper
\begin_layout Standard
Maybe you want to redraw these answers.
I don't mind.
Beliefs are likely to vary.
The key idea is that we can use models,
\begin_inset Quotes eld
\end_inset
probability distributions,
\begin_inset Quotes erd
\end_inset
to describe our expectations.
They specify the range of observations and the chances that various scores
will be observed.
\end_layout
\begin_layout Standard
Describing what we expect to see in a sample is one of the important purposes
of probability models.
\end_layout
\begin_layout Standard
Now let me challenge you again.
\end_layout
\begin_layout Enumerate
Describe the average score we are likely to observe on the first test in
my US politics course.
\end_layout
\begin_deeper
\begin_layout Standard
I suppose you'll nag me for more information.
This semester, the average on the first test was 78.3.
Last year, the average was 77.1.
Before that, it was 79.2.
I have tenure now, and I expect to impose myself on the students for about
100 more years, so you can wait and give your answer later, if you want
to.
I'll forward the data to you.
\end_layout
\end_deeper
\begin_layout Enumerate
Suppose you keep track of your neighbor's dog escapes for months and months.
What do you expect the average number of escapes per month will be?
\end_layout
\begin_layout Standard
These questions ask for us to summarize across a series of observations.
These are a little bit more abstract, but not too difficult.
There is no right or wrong answer, I am only asking about your opinion.
\end_layout
\begin_layout Standard
Here are my answers.
\end_layout
\begin_layout Enumerate
I believe the test averages will follow this pattern.
\end_layout
\begin_deeper
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < 68:87
\end_layout
\begin_layout Plain Layout
y < dnorm(x, m=78, sd=2)
\end_layout
\begin_layout Plain Layout
plot(x, y, type="l", xlim=c(60,95), lty=4, xlab="Mean Test Scores", ylab="Chance
of Observing Means")
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tintro30}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
My Beliefs About the Class Mean
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
I think the average is most likely to be 78, and I'm very confident it will
be in a range from 74 to 82.
There's still a chance it might be more extreme, but I'm pretty doubtful.
This distribution has one hump.
I've not drawn it that way by mistake.
It seems that when we average the scores of a class, we smooth out the
bumps of the score distribution.
The distribution of the means is simpler.
\end_layout
\end_deeper
\begin_layout Enumerate
I expect the average number of dog escapes will be between 0 and 1, and
there's a very small chance that it will be greater than 1.
And, obviously, it cannot be negative.
\end_layout
\begin_deeper
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < seq(0.001, 7, length=400)
\end_layout
\begin_layout Plain Layout
y < dgamma(x, 1.2, scale=0.7)
\end_layout
\begin_layout Plain Layout
plot(x,y, type="l", lty=4, xlab="Mean of Dog Escapes", ylab="Distribution
of Average Dog Escapes", )
\end_layout
\begin_layout Plain Layout
@
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tintro40}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
My Beliefs About the Number of Dog Escapes
\begin_inset CommandInset label
LatexCommand label
name "fig:DogEscapeEV"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
In my mind, the average number of escapes is 0.84.
That's just what I think, on the basis of my experience, and the chance
that the mean number of escapes is greater than 5 is, well, really small.
I think it is more likely that monkeys will fly out of your ...
ear than it is that the average number of escapes will be 7.
\end_layout
\end_deeper
\begin_layout Standard
The first challenges asked us to describe the range of observed outcomes
\begin_inset Quotes eld
\end_inset
right now.
\begin_inset Quotes erd
\end_inset
Somehow, it seems more
\begin_inset Quotes eld
\end_inset
tangible
\begin_inset Quotes erd
\end_inset
and
\begin_inset Quotes eld
\end_inset
realistic
\begin_inset Quotes erd
\end_inset
to summarize observed scores.
\end_layout
\begin_layout Standard
The second set of challenges is somewhat more abstract, but to me, problems
of this type are more satisfying and interesting.
They ask us to describe the range of outcomes we might observe across a
series of experiments,
\emph on
even if we do not actually conduct the experiments!
\emph default
If you want to wait until I'm 150 years old, or until you have lived next
to the same neighbor for 100 years, you can have an exhaustive set of data
and revise your estimates.
It will be pretty boring, and, to the surprise of most people, it is not
necessary.
This latter type of reasoningimagining the variation of estimates that
would ariseis the most important part of statistics.
It is called
\begin_inset Quotes eld
\end_inset
inferential statistics
\begin_inset Quotes erd
\end_inset
.
From a set of observations, we can make statements not only about summary
values like
\begin_inset Quotes eld
\end_inset
means,
\begin_inset Quotes erd
\end_inset
but we can also summarize our uncertainty about those estimates.
\end_layout
\begin_layout Section
Key Terms: population and random variable
\end_layout
\begin_layout Standard
There is a problem with the word
\begin_inset Quotes eld
\end_inset
population.
\begin_inset Quotes erd
\end_inset
Ordinary people use that to refer to
\begin_inset Quotes eld
\end_inset
all the people who are alive,
\begin_inset Quotes erd
\end_inset
but statisticians usually mean something different.
Quite often, statisticians use the term population to mean
\begin_inset Quotes eld
\end_inset
something from which observations are drawn
\begin_inset Quotes erd
\end_inset
and the conclusions they draw about that process are thought to characterize
the process that generates observations, rather than simply describing
\begin_inset Quotes eld
\end_inset
all of the people.
\begin_inset Quotes erd
\end_inset
To differentiate the
\begin_inset Quotes eld
\end_inset
population as process
\begin_inset Quotes erd
\end_inset
from
\begin_inset Quotes eld
\end_inset
population as finite collection
\begin_inset Quotes erd
\end_inset
usages of the term, some will refer to former as a
\begin_inset Quotes eld
\end_inset
superpopulation.
\begin_inset Quotes erd
\end_inset
That term is not widely used in applied statistics, but statisticians do
use it when they are trying to be very clear.
\end_layout
\begin_layout Standard
Most of the time, when we are talking about statistics, we are not talking
about estimating features of
\begin_inset Quotes eld
\end_inset
the people
\begin_inset Quotes erd
\end_inset
(or
\begin_inset Quotes eld
\end_inset
the fish in the river
\begin_inset Quotes erd
\end_inset
or whatever), instead we want to characterize the process that governs
the creation of data.
If we were studying a finite population, all of our conclusions would be
instantly out of date when research is reported, since members of that
finite population will have died or otherwise exited.
\end_layout
\begin_layout Standard
When we study the population (the superpopulation), researchers usually
have in mind a bigger meaning:
\end_layout
\begin_layout Description
population  process that generates observations.
This has the same meaning as
\begin_inset Quotes eld
\end_inset
stochastic process
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Standard
We seek to
\begin_inset Quotes eld
\end_inset
characterize
\begin_inset Quotes erd
\end_inset
a population by a mathematical formula, an equation that depends on some
important coefficients we call
\begin_inset Quotes eld
\end_inset
parameters.
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Description
parameter: adjustable characteristic that alters the qualities of a probability
process.
\end_layout
\begin_layout Standard
Our goal in probability analysis is to write down a
\begin_inset Quotes eld
\end_inset
mathematical model
\begin_inset Quotes erd
\end_inset
that we can understand, and then investigate its properties.
We usually do that by imagining how predictable observations from that
process might be.
\end_layout
\begin_layout Description
random
\begin_inset space ~
\end_inset
variable  an observationa numberdrawn from population (a score
\begin_inset Quotes eld
\end_inset
pulled
\begin_inset Quotes erd
\end_inset
from a random process that is thought to be governed by mathematical laws).
\end_layout
\begin_layout Standard
Once we have a pretty good understanding of the theoretical properties,
then we might try to translate them into a study of observations from out
there in the
\begin_inset Quotes eld
\end_inset
real world.
\begin_inset Quotes erd
\end_inset
We wonder,
\begin_inset Quotes eld
\end_inset
do those
\emph on
things
\emph default
we observe
\emph on
out there
\emph default
come from a process that resembles the theory that we explore in our minds?
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Description
Example: Rolling dice.
If I say each side of a die is likely to land facing up with probability
\begin_inset Formula $1/6$
\end_inset
, I don't mean to say I've rolled a million dice and counted.
I intend to characterize the process itself, not the results on a million
rolls.
If I roll the die 50 times and try to reevaluate the probabilities, I'm
not trying to estimate the number of 1's I'd get in a sample of 1 million
rolls.
Instead, I want to know the chances of a 1 on any given roll of that die.
\end_layout
\begin_layout Description
Example: Coin flips.
Imagine a fair coin.
Each outcome is equally likely: the probability of head is 0.5 and the probabili
ty of a tail is 0.5.
We have a probability model for that kind of process.
The
\begin_inset Quotes eld
\end_inset
Binomial distribution
\begin_inset Quotes erd
\end_inset
describes the chances of observing a certain number of heads and tails.
Then we wonder if the referee who administers coin flips before football
games is a random process of that type.
If there are 10 games this week, we might collect a string of data, {H,T,H,...}.
Those ten flips represent a collection of scores from a random process.
From that sample, we often want to find out if the referee's coin is
\begin_inset Quotes eld
\end_inset
fair.
\begin_inset Quotes erd
\end_inset
That is, does the data match the theory?
\end_layout
\begin_layout Standard
The term
\begin_inset Quotes eld
\end_inset
random
\begin_inset Quotes erd
\end_inset
is frequently misunderstood.
It does not mean
\begin_inset Quotes eld
\end_inset
equally likely
\begin_inset Quotes erd
\end_inset
.
It means outcomes are generated according to a given probability process.
The term
\begin_inset Quotes eld
\end_inset
random
\begin_inset Quotes erd
\end_inset
thus means
\begin_inset Quotes eld
\end_inset
patterned unpredictability.
\begin_inset Quotes erd
\end_inset
A process would still be called random if there were two outcomes and one
occurs 'almost all the time.'
\end_layout
\begin_layout Subsubsection*
Finite Population Interpretation
\end_layout
\begin_layout Standard
A competing interpretation of population is the
\begin_inset Quotes eld
\end_inset
finite population
\begin_inset Quotes erd
\end_inset
view.
I think this is not usually useful in the advanced contexts of statistics,
but it is frequently taught in elementary statistics courses (even as the
teacher will usually say,
\begin_inset Quotes eld
\end_inset
this is not quite right, but it might help you to get started
\begin_inset Quotes erd
\end_inset
).
\end_layout
\begin_layout Standard
In this view, the population is thought of as a fixed collection of things
from which we draw examples (as if we were blindfolded pulling numbers
in a BINGO parlor).
\end_layout
\begin_layout Standard
In an introductory course on probability, almost every student has suffered
through silly exercises like this.
\end_layout
\begin_layout Description
The
\begin_inset space ~
\end_inset
classic
\begin_inset space ~
\end_inset
example: Consider an Urn with 1000 balls,
\begin_inset Formula $X$
\end_inset
are red,
\begin_inset Formula $1000X$
\end_inset
are blue.
From a sample of 20 balls, 6 of which are red, estimate the fraction that
is red in the population of 1000 balls.
\end_layout
\begin_layout Standard
Perhaps this view could be useful if we could convince ourselves that there
is a fixed, finite set of things we want to know about.
But I don't have many Urns full of balls I need to study.
I'm straining myself to think of a realistic example.
Consider dental cavities among prison inmates.
Suppose that no new prisoners are brought in, none are released.
None of them can die or escape.
The
\begin_inset Quotes eld
\end_inset
population
\begin_inset Quotes erd
\end_inset
might actually mean
\begin_inset Quotes eld
\end_inset
these particular prisoners.
\begin_inset Quotes erd
\end_inset
We want to know what fraction of the prisoners have a cavity in the second
molar on the left lower jaw.
Our prison is too large to actually check them all, so we have the dentist
examine some of them.
From that sample, we might estimate the number of cavities.
\end_layout
\begin_layout Standard
I often belittle this view, saying its practitioners are mainly interested
in tedious projects, such as estimating the number of lefthanded red heads
in a Cincinnati.
There are various weaknesses in the finite population interpretation.
It is not exactly fatally flawed, but it does make some parts of statistics
very difficult to understand (in my humble opinion).
\end_layout
\begin_layout Standard
Here is one such example.
The idea of drawing
\begin_inset Quotes eld
\end_inset
independently and identically distributed
\begin_inset Quotes erd
\end_inset
(iid) observations into a sample is not practical within the finite population
perspective.
We need iid observations in order to derive almost all of the results in
inferential statistics and maximum likelihood analysis.
The observations must be independent so that we can multiply them to calculate
their joint probability.
They must be identical so we can act as if they come from the same distribution.
If we take the finite population approach, it is impossible to draw a sample
of identically distributed observations because taking one case out of
the population alters the characteristics of the remaining cases, and the
next draw will not be statistically identical to the original case.
\end_layout
\begin_layout Standard
In some contexts, one can reach the same conclusion from either perspective.
Because some sampling ideas are more easily developed with an urn of colored
balls, that model is still used in elementary statistics.
In some practical areas of applied statistics, where it is necessary to
find out how many acres are currently infected with a blight, the finite
approach is prominent.
But almost all of the workaday tools of modern research scientists are
based on the idea that the observations we are studying are drawn from
a random probability process, rather than a finite collection of things.
\end_layout
\begin_layout Section
Probability Theory: The Language of Statistics
\end_layout
\begin_layout Subsection
Sample
\begin_inset space ~
\end_inset
Space
\end_layout
\begin_layout Standard
Generally speaking, the sample space is the set of all
\begin_inset Quotes eld
\end_inset
outcomes
\begin_inset Quotes erd
\end_inset
(people, opinions, animals, etc) that might be observed as a result of
a random process.
\end_layout
\begin_layout Description
Discrete
\begin_inset space ~
\end_inset
Sample
\begin_inset space ~
\end_inset
Space: the possible outcomes are drawn from a finite list,
\begin_inset Formula $X=\{1,2,3,\ldots\}$
\end_inset
\end_layout
\begin_layout Description
Continuous
\begin_inset space ~
\end_inset
Sample
\begin_inset space ~
\end_inset
Space: possible outcomes are drawn from a continuum (the real numbers,
\begin_inset Formula $\mathbb{R}$
\end_inset
), either bounded (such as
\begin_inset Formula $[0,1]$
\end_inset
or infinite
\begin_inset Formula $(\infty,+\infty)$
\end_inset
or half closed,
\begin_inset Formula $[0,+\infty)$
\end_inset
).
\end_layout
\begin_layout Subsection
Probability
\end_layout
\begin_layout Standard
In my opinion, it is easiest to think of probability as a property of a
\begin_inset Quotes eld
\end_inset
region
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
subset
\begin_inset Quotes erd
\end_inset
of possible outcomes.
\end_layout
\begin_layout Description
\begin_inset Formula $p(x\leq4)$
\end_inset
the chance that one randomly drawn observation
\begin_inset Formula $x$
\end_inset
is less than or equal to 4.
\begin_inset Newline newline
\end_inset
There are many different kinds of notation.
If I'm worried the reader will forget, I will often write
\begin_inset Formula $Pr(x<4)$
\end_inset
or even
\begin_inset Formula $Prob(x<4)$
\end_inset
.
There is something to be said for letters from other alphabets.
How would you feel about
\begin_inset Formula $\pi(x\leq4)$
\end_inset
?
\end_layout
\begin_layout Standard
\begin_inset Formula $p(7\leq x\leq9)$
\end_inset
the chance that
\begin_inset Formula $x$
\end_inset
is in
\begin_inset Formula $[7,9]$
\end_inset
 between 7 and 9, inclusive.
\end_layout
\begin_layout Standard
I don't think it pays to invest too much effort to answer the question
\begin_inset Quotes eld
\end_inset
what is probability?
\begin_inset Quotes erd
\end_inset
This is a point of contention between competing schools of thought, and
by the time you are qualified to answer the question, you will ready have
decided which camp you prefer, and the answer will seem completely obvious
to you.
\end_layout
\begin_layout Standard
In a nutshell, the competing views are as follows.
\end_layout
\begin_layout Enumerate
View 1.
Probability is
\begin_inset Quotes eld
\end_inset
long run relative frequency
\begin_inset Quotes erd
\end_inset
.
\end_layout
\begin_deeper
\begin_layout Standard
Suppose we draw observations over and over, forever.
After an infinite number of draws, the observed fraction of
\begin_inset Formula $x$
\end_inset
's observed will match
\begin_inset Formula $p(x)$
\end_inset
.
If I ask you,
\begin_inset Quotes eld
\end_inset
what is the chance that the next outcome will be
\begin_inset Formula $x$
\end_inset
,
\begin_inset Quotes erd
\end_inset
and you answer,
\begin_inset Quotes eld
\end_inset
wait a minute, I have to conduct an infinite series of experiments to find
out,
\begin_inset Quotes erd
\end_inset
then you belong in this group of scholars.
\end_layout
\end_deeper
\begin_layout Enumerate
View 2.
Probability as
\begin_inset Quotes eld
\end_inset
degree of belief
\begin_inset Quotes erd
\end_inset
.
\end_layout
\begin_deeper
\begin_layout Standard
Probability models summarize a person's theory of the world, a belief about
what is likely to happen in any one
\begin_inset Quotes eld
\end_inset
flip of the coin
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
roll of the dice.
\begin_inset Quotes erd
\end_inset
How strongly do you believe that the next observation will be
\begin_inset Formula $x$
\end_inset
? If your answer is,
\begin_inset Formula $p(x)$
\end_inset
, then you belong in this group.
\end_layout
\end_deeper
\begin_layout Standard
These two views are, essentially, the difference of opinion that keeps the
\begin_inset Quotes eld
\end_inset
frequentist
\begin_inset Quotes erd
\end_inset
statisticians and the
\begin_inset Quotes eld
\end_inset
Bayesian
\begin_inset Quotes erd
\end_inset
statisticians at war with each other.
\end_layout
\begin_layout Subsection*
How to avoid the philosophical disagreement over the meaning of probability.
\end_layout
\begin_layout Standard
Consider a discrete variable that can take on values
\begin_inset Formula $\{1,2,3,4,5\}$
\end_inset
.
\end_layout
\begin_layout Standard
A probability model must list the possible outcomes and assign a number
\begin_inset Formula $p(x_{i})$
\end_inset
to each one.
\end_layout
\begin_layout Standard
\begin_inset Tabular
\begin_inset Text
\begin_layout Plain Layout
Outcome
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $x_{i}=$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
1
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
2
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
3
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
4
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
5
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
probability
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $p(x_{i})$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
1/5
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
1/5
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
1/5
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
1/5
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
1/5
\end_layout
\end_inset

\end_inset
\end_layout
\begin_layout Standard
Here we suppose the 5 outcomes are equally likely.
\end_layout
\begin_layout Standard
It is not necessary to take that view, however.
Fiddle the
\begin_inset Formula $p's$
\end_inset
however you like:
\end_layout
\begin_layout Standard
\begin_inset Tabular
\begin_inset Text
\begin_layout Plain Layout
Outcome
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $x_{i}=$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
1
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
2
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
3
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
4
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
5
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
probability
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $p(x_{i})$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
0.05
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
0.1
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
0.2
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
0.3
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
0.35
\end_layout
\end_inset

\end_inset
\end_layout
\begin_layout Standard
And your friend who hates the number 3 might as well have a turn writing
down his favorite model:
\end_layout
\begin_layout Standard
\begin_inset Tabular
\begin_inset Text
\begin_layout Plain Layout
Outcome
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $x_{i}=$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
1
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
2
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
3
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
4
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
5
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
probability
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $p(x_{i})$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
0.25
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
0.25
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
0.0
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
0.25
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
0.25
\end_layout
\end_inset

\end_inset
\end_layout
\begin_layout Standard
These are all probability models because:
\end_layout
\begin_layout Enumerate
No outcome is less likely than impossible (that is,
\begin_inset Formula $p(x_{i})\geq0.0$
\end_inset
).
\end_layout
\begin_layout Enumerate
The sum of the probabilities of all outcomes is 1.0.
\end_layout
\begin_layout Standard
The reader can reassign the probabilities in any way he or she wants to,
as long as the result meets those requirements.
The result will be a probability model, in my opinion.
\end_layout
\begin_layout Standard
I suggest you stop worrying about subjective meaning of
\begin_inset Formula $p(x)$
\end_inset
at this stage.
Just try to develop an intuition for the analysis that we perform with
these numbers.
The mathematics of probability analysis will
\begin_inset Quotes eld
\end_inset
flow,
\begin_inset Quotes erd
\end_inset
no matter how you interpret the probability values.
That's why I don't think it is worthwhile to spend too much time worrying
about what a probability number
\begin_inset Quotes eld
\end_inset
really
\emph on
is
\emph default
.
\begin_inset Quotes erd
\end_inset
If you really want to get metaphysical about it, please wait.
\end_layout
\begin_layout Subsection
Multiplication and Addition Principles
\end_layout
\begin_layout Description
Multiplication
\begin_inset space ~
\end_inset
Principle.
The chance that
\begin_inset Formula $m$
\end_inset
separate things will happen equals the product of their individual probabilitie
s.
\end_layout
\begin_deeper
\begin_layout Description
Example
\begin_inset space ~
\end_inset
1: Roll a die.
Let's suppose it is a fair die.
The chance of it landing on 1 is
\begin_inset Formula $1/6$
\end_inset
.
Roll again.
The chance of 1 again is
\begin_inset Formula $1/6$
\end_inset
.
Thus, the chance of rolling 1 twice in a row is
\begin_inset Formula $\frac{1}{6}\times\frac{1}{6}=\frac{1}{36}$
\end_inset
.
The chance of rolling another 1 is
\begin_inset Formula $1/6,$
\end_inset
but the chance of rolling three
\begin_inset Formula $1$
\end_inset
's is
\begin_inset Formula $\frac{1}{6}\times\frac{1}{6}\times\frac{1}{6}=\frac{1}{216}=0.00462963$
\end_inset
.
And so forth.
\end_layout
\begin_deeper
\begin_layout Standard
Now, suppose I pull a die from my pocket and roll three 1's in a row.
If the die is fair, you've observed something that is fairly unusual.
The probability of three 1's is smaller than
\begin_inset Formula $5$
\end_inset
out of 1,000 sets of rolls.
Now suppose I roll another 1.
Wow, that's crazy! The multiplication principle says the probability of
a fair die generating four 1's is
\begin_inset Formula $0.000771605$
\end_inset
.
Then I roll another 1! The probability is
\begin_inset Formula $0.0001286$
\end_inset
.
A fair die would give me five 1's in a row only about 1 time in 10,000.
\end_layout
\begin_layout Standard
At some point, you have to interrupt me with this story and ask to inspect
my die.
It is a special one I had made up just for this purpose.
There is only one spot on all six faces.
\end_layout
\begin_layout Standard
And that is the lesson to be learned.
If you work out a probability model, and then observations seem to fly
in the face of the probability model, then you have to at least consider
the possibility that your probability model is wrong.
\end_layout
\end_deeper
\begin_layout Description
Example
\begin_inset space ~
\end_inset
2: Survey 1000 people, ask them if the death penalty should be enforced
against convicted first degree murders.
The chance that respondent 1 will say yes is
\begin_inset Formula $p(x_{1}=Yes)$
\end_inset
.
The chance that the second person says yes is
\begin_inset Formula $p(x_{2}=Yes)$
\end_inset
, and so forth.
Then the chance of observing a particular
\begin_inset Formula $(Yes,Yes,No,Yes,\ldots,No)$
\end_inset
pattern is the product of all respondent probabilities:
\begin_inset Formula
\begin{eqnarray}
& & p(x_{1}=Yes,x_{2}=Yes,x_{3}=No,x_{4}=Yes,\ldots,x_{1000}=No)\nonumber \\
& = & p(x_{1}=Yes)p(x_{2}=Yes)p(x_{3}=No)p(x_{4}=Yes)\ldots p(x_{1000}=No).\label{eq:likelihood10}
\end{eqnarray}
\end_inset
\end_layout
\begin_deeper
\begin_layout Standard
The probability of observing a whole sample is equal to the products of
the individual events.
\end_layout
\begin_layout Standard
This is only true, obviously, if each person's response is independent of
each other person's response, but that is commonly assumed in public opinion
surveys.
The probability of observing a sample of responses is thus calculated by
combining the probabilities of individual responses.
\end_layout
\begin_layout Standard
We can use a mathematical trick to make analysis of this overall probability
more manageable.
Remember the following mathematical law for logarithms:
\begin_inset Formula
\begin{equation}
log(x\cdot y)=log(x)+log(y).
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
The mnemonic for that is,
\begin_inset Quotes eld
\end_inset
The log of a product is the sum of the logs.
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Standard
That means we can convert the long quantity in expression (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:likelihood10"
\end_inset
) into a sum of logs,
\begin_inset Formula
\begin{align}
log(p(x_{1}=Yes,x_{2}=Yes,x_{3}=No,x_{4}=yes,\ldots,x_{1000}=No)=\nonumber \\
log(p(x_{1}=Yes))+log(p(x_{2}=Yes))+\ldots+log(p(x_{1000}=No)).
\end{align}
\end_inset
\end_layout
\begin_layout Standard
In a more advanced statistical context, this kind of reasoning is known
as
\begin_inset Quotes eld
\end_inset
maximum likelihood analysis,
\begin_inset Quotes erd
\end_inset
a procedure through which we study the probability of observing a set of
responses as a product of individual probabilities.
This last expression would be known as the
\begin_inset Quotes eld
\end_inset
log of the likelihood function.
\begin_inset Quotes erd
\end_inset
It is a very important foundation in probability analysis.
\end_layout
\end_deeper
\begin_layout Description
Note: These calculations presuppose the two events are
\begin_inset Quotes eld
\end_inset
independent.
\begin_inset Quotes erd
\end_inset
The chance that one will occur is not affected by the chance that the other
will occur.
\end_layout
\end_deeper
\begin_layout Description
Addition_Principle.
To determine the probability that one event among a list of possibilities
might occur, add their probabilities together.
\end_layout
\begin_deeper
\begin_layout Description
Example
\begin_inset space ~
\end_inset
1: Roll a die with 6 sides.
How likely am I to roll a number smaller than 3?
\begin_inset Formula
\[
p(x=1)+p(x=2)=\sum_{i<3}p(x=i)
\]
\end_inset
\end_layout
\begin_layout Description
Example
\begin_inset space ~
\end_inset
2: People can be righthanded, lefthanded, or ambidextrous.
What is the probability that a randomly selected person will not be left
handed?
\begin_inset Formula
\[
p(x="righthanded")+p(x="ambidexterous")
\]
\end_inset
\end_layout
\begin_layout Description
Example
\begin_inset space ~
\end_inset
3: The chance that a randomly chosen American will say she is a
\begin_inset Quotes eld
\end_inset
strong Democrat
\begin_inset Quotes erd
\end_inset
is 0.15.
The chance that a person is a
\begin_inset Quotes eld
\end_inset
weak Democrat
\begin_inset Quotes erd
\end_inset
is 0.20.
That means the chance that the person is a Democrat, either strong or weak,
is
\begin_inset Formula
\[
p(x="Strong\, Democrat")+p(x="weak\, Democrat")=0.15+0.20
\]
\end_inset
\end_layout
\begin_layout Standard
The chance that one among a collection of possibilities will occur is equal
to the sum of their probabilities.
\end_layout
\end_deeper
\begin_layout Section
Terminology for Discrete Distributions
\end_layout
\begin_layout Standard
A discrete distribution is one for which we have a list of possible outcomes,
\begin_inset Formula $\{x_{1},x_{2},\ldots,x_{m}\}$
\end_inset
.
Outcomes can be placed into onetoone correspondence with the integer
'counting numbers'.
\end_layout
\begin_layout Standard
If an observation is drawn at random, the chance that it will fall into
the
\begin_inset Formula $j'th$
\end_inset
category, meaning the observed value will be
\begin_inset Formula $x_{j}$
\end_inset
, is
\begin_inset Formula $p(x_{j}).$
\end_inset
\end_layout
\begin_layout Subsection
Probability Mass Function
\end_layout
\begin_layout Standard
Any formula
\begin_inset Formula $p(x)$
\end_inset
can be a
\series bold
probability mass function
\series default
(PMF) if
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
1\geq p(x)\geq0
\]
\end_inset
and
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
\sum_{x_{j}\in X}p(x_{j})=\sum_{j=1}^{m}p(x_{j})=1
\]
\end_inset
\end_layout
\begin_layout Standard
Sometimes I use notation
\begin_inset Formula $P(X=k)$
\end_inset
instead of
\begin_inset Formula $p(x_{k}).$
\end_inset
Either way should be clear enough.
\end_layout
\begin_layout Standard
Discrete distributions are needed when the process we are considering offers
up outcomes that are
\begin_inset Quotes eld
\end_inset
grainy,
\begin_inset Quotes erd
\end_inset
in the sense that they cannot be averaged together.
In the case of a die, for example we can observe a 3 or a 4, but nothing
in between.
We never roll 3.5.
An animal is either a cat or a dog; except in cartoons, there is no such
thing as a
\begin_inset Quotes eld
\end_inset
catdog
\begin_inset Quotes erd
\end_inset
.
\end_layout
\begin_layout Section
Continuous Random Variables
\end_layout
\begin_layout Standard
Until now, I made this easy on myself by discussing coins, dice, and Democrats.
Most of the important distributions we need to work on are not discrete.
Rather, they are continuous: the outcomes correspond to segments of the
real number line or the Cartesian plane.
\end_layout
\begin_layout Standard
With continuous numbers, we might have a temperature of 80, or 90, or any
number between them.
Variables like time, the proportion of citizens who call themselves Democrats,
blood pressure, and so forth, lend themselves much more readily to treatment
on a mathematical continuum.
\end_layout
\begin_layout Standard
How are
\begin_inset Quotes eld
\end_inset
continuous numbers
\begin_inset Quotes erd
\end_inset
different from
\begin_inset Quotes eld
\end_inset
discrete numbers
\begin_inset Quotes erd
\end_inset
? I hate to bring up bad memories for readers, but do you remember high
school geometry? When the teacher discussed points and lines, did she say
\begin_inset Quotes eld
\end_inset
a point has no width
\begin_inset Quotes erd
\end_inset
? That seemed wrong.
It bothered me.
It seemed obvious that if I drew a dot on the paper, my point did have
some width.
With a magnifying glass and a ruler, I could measure it.
With a microscope, I could even see texture inside the dot.
That bothered me so much I couldn't pay attention for a whole semester.
\end_layout
\begin_layout Standard
Here's the part I did not appreciate.
When I measured the width of the pencil mark, I was simply mistaken in
thinking I was measuring a point.
I was measuring the distance from the outer left edge of the pencil dot
to the outer right edge of the dot.
I was not measuring
\begin_inset Quotes eld
\end_inset
a point,
\begin_inset Quotes erd
\end_inset
I was measuring the diameter of a region.
I was measuring the
\begin_inset Quotes eld
\end_inset
distance between two points.
\begin_inset Quotes erd
\end_inset
Points play the role of
\begin_inset Quotes eld
\end_inset
edge markers
\begin_inset Quotes erd
\end_inset
and two points mark off a region.
\end_layout
\begin_layout Standard
In the same sense that a
\begin_inset Quotes eld
\end_inset
point has no width,
\begin_inset Quotes erd
\end_inset
a single point has no probability.
A single point does have
\begin_inset Quotes eld
\end_inset
probability density,
\begin_inset Quotes erd
\end_inset
however.
And, after we understand density, we can make the next step to measure
the
\begin_inset Quotes eld
\end_inset
probability
\begin_inset Quotes erd
\end_inset
between two points (no magnifying glass required).
\end_layout
\begin_layout Subsection
PDF: probability density function:
\end_layout
\begin_layout Standard
The probability density function (PDF) describes the probability density
of each possible outcome.
The term
\begin_inset Quotes eld
\end_inset
probability density
\begin_inset Quotes erd
\end_inset
is somewhat elusive, but I expect it will reveal itself to you before this
section is finished.
\end_layout
\begin_layout Standard
For the sake of clarityto keep the continuous density separate from the
discrete probability model separate in our mindsit is customary to use
a different letter for the PDF.
I'll use the letter
\begin_inset Formula $f$
\end_inset
, most of the time.
\end_layout
\begin_layout Standard
A pdf
\begin_inset Formula $f$
\end_inset
for
\begin_inset Formula $x$
\end_inset
must meet two requirements.
\end_layout
\begin_layout Enumerate
The value of
\begin_inset Formula $f(x)$
\end_inset
cannot be negative
\begin_inset Formula
\begin{equation}
f(x)\geq0
\end{equation}
\end_inset
\end_layout
\begin_layout Enumerate
The area under that function must be
\begin_inset Formula $1.0$
\end_inset
.
\begin_inset Formula
\begin{equation}
\int\,\, f(x)dx=1.0
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
The domain of a continuous variable might be the whole range from
\begin_inset Formula $(\infty,+\infty).$
\end_inset
Any
\begin_inset Quotes eld
\end_inset
chunk
\begin_inset Quotes erd
\end_inset
of the real line will also do, such as
\begin_inset Formula $[0,1]$
\end_inset
or
\begin_inset Formula $(0,\infty)$
\end_inset
.
\end_layout
\begin_layout Standard
A continuous density function must be interpreted differently than a discrete
distribution.
\end_layout
\begin_layout Subsubsection*
Where do probability density functions come from?
\end_layout
\begin_layout Standard
Any function that has nonnegative values and a finite value for its integral
can be converted into a PDF.
\end_layout
\begin_layout Standard
How can this ugly thing be a PDF?
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < seq(from=0,to=1,by=0.01)
\end_layout
\begin_layout Plain Layout
y < 2+exp(0.02*x)+1.2*sin(pi*(2*x))+1.8*cos(4*pi*x)
\end_layout
\begin_layout Plain Layout
plot(x, y, type="l",xlab="x",ylab="",main="")
\end_layout
\begin_layout Plain Layout
text( x[30], y[30], expression(g(x)), pos=1 )
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tu20}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
A Function That Might Become a PDF
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
We need to normalize it, so that the area under the curve is 1.0.
Calculate the integral (the area under the curve between 0 and 1),
\begin_inset Formula
\[
\int_{0}^{1}g(x)dx=17.
\]
\end_inset
\begin_inset Newline newline
\end_inset
The magic number is
\begin_inset Formula $17$
\end_inset
.
We convert
\begin_inset Formula $g(x)$
\end_inset
into a PDF by division, which normalizes
\begin_inset Formula $g(x)$
\end_inset
.
\begin_inset Formula
\[
f(x)=\frac{g(x)}{17}.
\]
\end_inset
\begin_inset Newline newline
\end_inset
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\uuline off
\uwave off
\noun off
\color none
That is a valid PDF because
\begin_inset Formula
\[
\int_{0}^{1}\frac{g(x)}{17}dx=1.0
\]
\end_inset
\family default
\series default
\shape default
\size default
\emph default
\bar default
\strikeout default
\uuline default
\uwave default
\noun default
\color inherit
Note that this assumes that
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\noun off
\color none
\begin_inset Formula $g(x)\geq0$
\end_inset
\family default
\series default
\shape default
\size default
\emph default
\bar default
\noun default
\color inherit
,
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\noun off
\color none
for all x, but that's a pretty minor restriction, since you can add and
multiply anything you want.
\end_layout
\begin_layout Standard
The function
\begin_inset Formula $g(x)$
\end_inset
in this example is called the
\begin_inset Quotes eld
\end_inset
kernel
\begin_inset Quotes erd
\end_inset
of the pdf, because it describes the substantively important variation
that we are studying.
The value
\begin_inset Formula $1/17$
\end_inset
is a
\begin_inset Quotes eld
\end_inset
normalizing constant.
\begin_inset Quotes erd
\end_inset
It is not substantively important, it is only needed to bring the area
under the curve down from 17 to 1.0.
\end_layout
\begin_layout Description
Claim: Any positive function with a finite integral can be the kernel of
a density function.
\end_layout
\begin_layout Subsection
Cumulative Distribution Function
\begin_inset CommandInset label
LatexCommand label
name "sub:CumulativeDistributionFunction"
\end_inset
\end_layout
\begin_layout Standard
A point has no width, no probability.
But the chance of an outcome between 0.001 and 0.002 can be calculated.
That region might be what we have in mind when we say
\begin_inset Quotes eld
\end_inset
point,
\begin_inset Quotes erd
\end_inset
but it is a region.
When we talk about hitting the target in darts or golf, we don't actually
intend to hit a particular point, we mean a region between two points.
Finding the chances of an outcome in that region is the main purpose of
the cumulative distribution function (CDF).
It helps us to answer questions like
\begin_inset Quotes eld
\end_inset
what is the chance that the outcome will be greater than 9?
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
what is the chance that the outcome will be between points
\begin_inset Formula $A$
\end_inset
and
\begin_inset Formula $B$
\end_inset
?
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Standard
The cumulative distribution function is commonly referred to by the capital
letter of the density function.
\begin_inset Formula $F(x^{u})$
\end_inset
represents the probability that a randomly drawn value from the distribution
\begin_inset Formula $f(x)$
\end_inset
will be smaller than a target value,
\begin_inset Formula $x^{u}$
\end_inset
.
It is the area under
\begin_inset Formula $f$
\end_inset
from the
\begin_inset Quotes eld
\end_inset
left edge
\begin_inset Quotes erd
\end_inset
of the possible outcomes (which may be
\begin_inset Formula $\infty$
\end_inset
) up to
\begin_inset Formula $x^{u}$
\end_inset
.
\begin_inset Formula
\begin{equation}
F(x^{u})=\int_{\infty}^{x^{u}}f(x)dx.
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
We often refer to the CDF simply as
\begin_inset Formula $F(x)$
\end_inset
, keeping in mind that the parenthesized value is the upper limit of an
integral.
If we use the letter
\begin_inset Formula $x$
\end_inset
for the input variable in
\begin_inset Formula $F(x)$
\end_inset
, then the math teachers will want us to use some other letter for outcomes.
For example,
\begin_inset Formula
\begin{equation}
F(x)=\int_{\infty}^{x}f(y)dy.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
I call the upper limit
\begin_inset Formula $x^{u}$
\end_inset
in order to avoid choosing a new letter.
\end_layout
\begin_layout Subsubsection*
A Brief Example: The Uniform Distribution
\end_layout
\begin_layout Standard
The simplest PDF is the Uniform probability model,
\begin_inset Formula $U(a,b)$
\end_inset
, which holds that all of the outcomes between
\begin_inset Formula $a$
\end_inset
and
\begin_inset Formula $b$
\end_inset
are equally likely.
The PDF is
\begin_inset Formula $f(x)=\frac{1}{ba}$
\end_inset
and the probability that the outcome will fall between any two values,
say
\begin_inset Formula $x^{l}$
\end_inset
and
\begin_inset Formula $x^{u}$
\end_inset
, is easy to calculate:
\begin_inset Formula
\[
Prob(x^{l}>=
\end_layout
\begin_layout Plain Layout
x < seq(from=0,to=1,by=1)
\end_layout
\begin_layout Plain Layout
y < c(1,1)
\end_layout
\begin_layout Plain Layout
plot(x, y, type="l",xlab="x",ylab="",main="", ylim=c(0,1.2), bty="L")
\end_layout
\begin_layout Plain Layout
lines(c(0,0),c(0,1),lty=2)
\end_layout
\begin_layout Plain Layout
lines(c(1,1),c(0,1),lty=2)
\end_layout
\begin_layout Plain Layout
text(0.5, 0.5, "Any value in [0,1] may arise
\backslash
n All points are equally likely")
\end_layout
\begin_layout Plain Layout
lines(c(0,1), c(0,0), lwd=0.3)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tu10}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Uniform Distribution
\begin_inset CommandInset label
LatexCommand label
name "fig:Uniform10"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
The fact that
\begin_inset Formula $f(x)=1.0$
\end_inset
does not mean that a point
\begin_inset Formula $x$
\end_inset
is
\begin_inset Quotes eld
\end_inset
certain
\begin_inset Quotes erd
\end_inset
to occur.
This is one of the little mathematical wrinkles that make PDFs different
from PMFs.
The value of
\begin_inset Formula $1.0$
\end_inset
is a density value, and its main substantive meaning is that we are able
to group together sets of outcomes and then the probability of an outcome
in a set is given by the integral of the PDF.
For example, the chance that an outcome will lie between 0.2 and 0.5 is represent
ed by the shaded region below:
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < seq(from=0,to=1,by=1)
\end_layout
\begin_layout Plain Layout
y < c(1,1)
\end_layout
\begin_layout Plain Layout
plot(x, y, type="l",xlab="x",ylab="",main="", ylim=c(0,1.2), bty="L")
\end_layout
\begin_layout Plain Layout
lines(c(0,0),c(0,1),lty=2)
\end_layout
\begin_layout Plain Layout
lines(c(1,1),c(0,1),lty=2)
\end_layout
\begin_layout Plain Layout
lines(c(0.2,0.2),c(0,1), lty=4)
\end_layout
\begin_layout Plain Layout
lines(c(0.5,0.5),c(0,1), lty=4)
\end_layout
\begin_layout Plain Layout
polygon(c(0.2,0.2,0.5,0.5),c(0,1,1,0), col="gray90")
\end_layout
\begin_layout Plain Layout
text( 0.35, 0.5, "This area represents
\backslash
n the probability
\backslash
n of an outcome
\backslash
n between 0.2 and 0.5",cex=0.8)
\end_layout
\begin_layout Plain Layout
lines(c(0,1), c(0,0), lwd=0.3)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tu11}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Probability that an Outcome will Lie Between Two Values
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
Because we are considering the simple case, where
\begin_inset Formula $a=0$
\end_inset
and
\begin_inset Formula $b=1$
\end_inset
, the CDF is easily seen to be
\begin_inset Formula
\[
F(x^{u})=\int_{0}^{1}f(x)dx=x^{u}
\]
\end_inset
The linkage between the PDF and the CDF is illustrated next.
Consider the PDF of the Uniform, with special attention to the chances
of outcomes smaller than 0.3, 0.5, and 0.7.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < seq(from=0,to=1,by=1)
\end_layout
\begin_layout Plain Layout
y < c(1,1)
\end_layout
\begin_layout Plain Layout
plot(x, y, type="l",xlab="x",ylab="",main="", ylim=c(0,1.2), bty="L")
\end_layout
\begin_layout Plain Layout
polygon(c(0.0,0.0,0.3,0.3),c(0,1,1,0), col="gray95", border=NA)
\end_layout
\begin_layout Plain Layout
polygon(c(0.3,0.3,0.5,0.5),c(0,1,1,0), col="gray93", border=NA)
\end_layout
\begin_layout Plain Layout
polygon(c(0.5,0.5,0.7,0.7),c(0,1,1,0), col="gray90", border=NA)
\end_layout
\begin_layout Plain Layout
lines(c(0,0),c(0,1),lty=2)
\end_layout
\begin_layout Plain Layout
lines(c(1,1),c(0,1),lty=2)
\end_layout
\begin_layout Plain Layout
lines(c(0.3,0.3),c(0,1), lty=4)
\end_layout
\begin_layout Plain Layout
lines(c(0.5,0.5),c(0,1), lty=4)
\end_layout
\begin_layout Plain Layout
lines(c(0.7,0.7),c(0,1), lty=4)
\end_layout
\begin_layout Plain Layout
text( 0.15, 0.2, "outcomes
\backslash
n below
\backslash
n 0.3")
\end_layout
\begin_layout Plain Layout
text( 0.25, 0.55, "outcomes below 0.5")
\end_layout
\begin_layout Plain Layout
text( 0.45, 0.75, "outcomes below 0.7")
\end_layout
\begin_layout Plain Layout
text(0.5, 1.15, expression(f(x)))
\end_layout
\begin_layout Plain Layout
lines(c(0,1), c(0,0), lwd=0.3)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tu30}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
A Uniform Probability Density Function
\begin_inset CommandInset label
LatexCommand label
name "fig:Uniform30"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
Those three values of
\begin_inset Formula $x$
\end_inset
are marked on this curve, which represents the cumulative distribution.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
plot(x, y, type="n",xlab="x",ylab = expression(F(x^u)), main="", ylim =
c(0,1.2), bty = "L")
\end_layout
\begin_layout Plain Layout
lines(c(1,1),c(0,1),lty=2)
\end_layout
\begin_layout Plain Layout
lines(c(0,1),c(0,1),lty=1)
\end_layout
\begin_layout Plain Layout
lines(c(0.3,0.3),c(0,.3), lty=4)
\end_layout
\begin_layout Plain Layout
lines(c(0.5,0.5),c(0,.5), lty=4)
\end_layout
\begin_layout Plain Layout
lines(c(0.7,0.7),c(0,.7), lty=4)
\end_layout
\begin_layout Plain Layout
text(0.3, 0.45, expression(F(0.3)))
\end_layout
\begin_layout Plain Layout
text(0.5, 0.65, expression(F(0.5)))
\end_layout
\begin_layout Plain Layout
text(0.7, 0.85, expression(F(0.7)))
\end_layout
\begin_layout Plain Layout
lines(c(0,1), c(0,0), lwd=0.3)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tu31}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Cumulative Distribution
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
All CDF's are strictly increasing functions of
\begin_inset Formula $x^{u}$
\end_inset
.
As
\begin_inset Formula $x^{u}$
\end_inset
moves to the right,
\begin_inset Formula $F(x^{u})$
\end_inset
always grows.
For PDFs that are unimodal, the CDF is an
\begin_inset Quotes eld
\end_inset
Sshaped curve.
\begin_inset Quotes erd
\end_inset
In Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:SshapedCDF"
\end_inset
, the PDF of a unimodal, symmetric PDF is presented, along with the Sshaped
CDF that it leads to.
The figure is based on the
\begin_inset Quotes eld
\end_inset
logistic distribution,
\begin_inset Quotes erd
\end_inset
a distribution that is often used in categorical models of choice because
is has a comparatively simple CDF.
It is one of the very few distributions for which the CDF appears to be
simpler than the PDF.
The PDF is
\begin_inset Formula
\begin{equation}
f(x)=\frac{1}{\sigma}\frac{exp(\frac{x\mu}{\sigma})}{\left(1+exp(\left(\frac{x\mu}{\sigma}\right))\right)^{2}}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
but the CDF is simply:
\begin_inset Formula
\begin{equation}
F(x^{u})=\frac{1}{1+e^{\frac{x^{u}\mu}{\sigma}}}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
xseq < seq(from=4, to=+4, length.out=500)
\end_layout
\begin_layout Plain Layout
pseq < dlogis(xseq)
\end_layout
\begin_layout Plain Layout
plot(xseq, pseq, type="l", xlab="x", ylab = "Probability Density", ylim=c(0,1),
main="")
\end_layout
\begin_layout Plain Layout
@
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
pseq < plogis(xseq)
\end_layout
\begin_layout Plain Layout
plot(xseq, pseq, type="l", xlab="x", ylab = "Cumulative Probability Density",
ylim=c(0,1), main="")
\end_layout
\begin_layout Plain Layout
@
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\end_layout
\begin_layout Plain Layout
\align center
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tlogistic10}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
The PDF of a Logistic Distribution
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tlogistic20}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
The CDF of a Logistic Distribution
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\begin_inset Caption
\begin_layout Plain Layout
The CDF is Sshaped
\begin_inset CommandInset label
LatexCommand label
name "fig:SshapedCDF"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Section
Moments
\end_layout
\begin_layout Standard
The
\series bold
expected value
\series default
of a distribution is defined as the
\begin_inset Quotes eld
\end_inset
probability weighted sum
\begin_inset Quotes erd
\end_inset
of outcomes.
\end_layout
\begin_layout Standard
For a continuous distribution, with density
\begin_inset Formula $f(x)$
\end_inset
:
\begin_inset Formula
\begin{equation}
E[x]=\int_{\infty}^{+\infty}f(x)\cdot x\, dx.\label{eq:EV}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
For a discrete distribution, with probability mass function
\begin_inset Formula $p(x_{j})$
\end_inset
:
\begin_inset Formula
\begin{equation}
E[x]=\sum_{j=1}^{m}p(x_{j})x_{j},\,\,\, where\, p(x_{j})=prob.\, of\, outcome\, x_{j}.
\end{equation}
\end_inset
As a result, the expected value of x is often referred to as the
\begin_inset Quotes eld
\end_inset
population mean
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
theoretical mean
\begin_inset Quotes erd
\end_inset
of a distribution that generates
\begin_inset Formula $x$
\end_inset
.
\end_layout
\begin_layout Standard
The expected value is the first moment of the distribution.
The expected value of
\begin_inset Formula $x^{2}$
\end_inset
,
\begin_inset Formula $E[x^{2}]$
\end_inset
, is the second moment,
\begin_inset Formula $E[x^{3}]$
\end_inset
is the third moment, and so forth.
In a course on mathematical statistics, one learns that the moments characteriz
e a distribution and allow us to make many important calculations, including
the variance, as we see next.
\end_layout
\begin_layout Standard
The
\series bold
population
\begin_inset space ~
\end_inset
variance
\series default
of a distribution (or just
\begin_inset Quotes eld
\end_inset
variance
\begin_inset Quotes erd
\end_inset
) is expected value of a squared deviation from the expected value.
That is to say, it is the
\begin_inset Quotes eld
\end_inset
probability weighted sum
\begin_inset Quotes erd
\end_inset
of the squared differences between outcomes and their expected values.
Variance is formally defined as a weighted sum of squared deviations
\begin_inset Formula
\begin{equation}
Var[x]=\int_{\infty}^{+\infty}f(x)\cdot(xE[x])^{2}\, dx,\label{eq:PopVariance}
\end{equation}
\end_inset
which can be rearranged as
\begin_inset Formula
\begin{equation}
Var[x]=\int_{\infty}^{+\infty}f(x)\cdot x^{2}\, dxE[x]^{2}=E[x^{2}]E[x]^{2}.\label{eq:PopVariance2}
\end{equation}
\end_inset
Repeat out loud:
\begin_inset Quotes eld
\end_inset
The variance of x equals the expected value of
\begin_inset Formula $x$
\end_inset
squared minus the Expected value of
\begin_inset Formula $x$
\end_inset
, squared.
\begin_inset Quotes erd
\end_inset
The variance can be calculated from the first two moments.
\end_layout
\begin_layout Standard
For a discrete distribution, the variance is defined similarly
\begin_inset Formula
\begin{equation}
Var[x]=\sum_{j=1}^{m}p(x_{j})(x_{j}E[x])^{2},\label{eq:PopVarianceDiscrete}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
and like (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:PopVariance"
\end_inset
), it is easily rearranged as
\begin_inset Formula
\begin{equation}
E[x^{2}]E[x]^{2}
\end{equation}
\end_inset
\end_layout
\begin_layout Subsection
How are Expected Value and Population Variance different from the
\begin_inset Quotes eld
\end_inset
average
\begin_inset Quotes erd
\end_inset
and the
\begin_inset Quotes eld
\end_inset
variance
\begin_inset Quotes erd
\end_inset
?
\end_layout
\begin_layout Standard
Here is a Very Important Point: Expected value and population variance are
\begin_inset Quotes eld
\end_inset
theoretical quantities.
\begin_inset Quotes erd
\end_inset
They are characterizations of a probability model.
\end_layout
\begin_layout Standard
One can estimate the expected value and the variance with a sample, but
should never (never) lose sight of the fact that
\begin_inset Formula $E[x]$
\end_inset
and
\begin_inset Formula $Var[x]$
\end_inset
are defined by a probability model.
\end_layout
\begin_layout Standard
The relationship between a sample mean and the expected value is demonstrated
most easily with a discrete variable.
Suppose a sample of scores is collected.
The observed frequencies, the counts for outcome
\begin_inset Formula $x_{j}$
\end_inset
, would be
\begin_inset Formula
\begin{equation}
freq(x_{j})=\frac{\#\, of\, observations\, of\, x_{j}}{\#\, of\, observations}\label{eq:freq}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
Sample average, which is often referred to as
\begin_inset Formula $\bar{x}$
\end_inset
, (pronounced
\begin_inset Quotes eld
\end_inset
x bar
\begin_inset Quotes erd
\end_inset
), is familiar to us as the sum of observations divided by
\begin_inset Formula $N$
\end_inset
.
With discrete data (
\begin_inset Formula $m$
\end_inset
possible outcomes), the sample average, can also be calculated as
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\bar{x}=\sum_{j=1}^{m}freq(x_{j})x_{j}\label{eq:averageWithFreq}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
The formula for the expected value (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:EV"
\end_inset
) is almost the same, except that the observed
\begin_inset Formula $freq(x_{j})$
\end_inset
is replaced by the
\emph on
true probability
\emph default
\begin_inset Formula $p(x_{j})$
\end_inset
.
\end_layout
\begin_layout Standard
Similarly, the sample variance can be calculated as
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\sum_{j=1}^{m}freq(x_{j})\cdot(x_{j}\bar{x})\label{eq:varianceWithFreq}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
which is almost the same as the population variance in (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:PopVarianceDiscrete"
\end_inset
).
\end_layout
\begin_layout Standard
Now, I would like to unburden myself by expressing frustration about statistical
notation.
In many advanced statistical models, a parameter is given a Greek letter,
say
\begin_inset Formula $\mu$
\end_inset
or
\begin_inset Formula $\lambda$
\end_inset
, and an estimate calculated from data is differentiated by the addition
of a hat, as in
\begin_inset Formula $\widehat{\mu}$
\end_inset
or
\begin_inset Formula $\widehat{\lambda}$
\end_inset
.
That practice is clear and consistent, but it has not trickled down to
introductory statistics.
In introductory statistics, the sample mean is commonly called
\begin_inset Formula $\bar{x}$
\end_inset
, even though it would be more meaningfully written as
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\widehat{E[x]}.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Some models will have a parameter, say
\begin_inset Formula $\mu_{x}$
\end_inset
or
\begin_inset Formula $\lambda_{x}$
\end_inset
, that we find is equal to
\begin_inset Formula $E[x]$
\end_inset
.
In those cases, it is only natural to refer to the sample average as
\begin_inset Formula $\widehat{\mu_{x}}$
\end_inset
or
\begin_inset Formula $\widehat{\lambda_{x}}$
\end_inset
.
Nevertheless, it is quite common to refer to a sample average as
\begin_inset Formula $\bar{x}$
\end_inset
.
\end_layout
\begin_layout Standard
Notation about estimates of variance is even more troublesome.
The most direct notation for a sample estimate of variance would be
\begin_inset Formula $\widehat{Var[x]}$
\end_inset
, and yet that is almost never done.
Instead, most authors seize on the fact that in one particular distribution,
there is a parameter called
\begin_inset Formula $\sigma^{2}$
\end_inset
that determines the distribution's variance.
If
\begin_inset Formula $Var[x]=\sigma_{x}^{2}$
\end_inset
, one will find all manner of notations to refer to the sample estimate,
such as
\begin_inset Formula $s_{x}^{2}$
\end_inset
.
Before computerized type setting made it convenient to insert symbols above
characters,
\begin_inset Formula $s_{x}^{2}$
\end_inset
was expedient.
Nevertheless, it is much more clear to refer to an estimate as
\begin_inset Formula $\widehat{Var[x]}$
\end_inset
or
\begin_inset Formula $\widehat{\sigma_{x}^{2}}$
\end_inset
.
\end_layout
\begin_layout Section
The Forces of Nature May Not Know Our Formula (or agree with us about the
parameters).
\end_layout
\begin_layout Standard
There's a lot of discussion in philosophy of science about whether or not
our models exist
\begin_inset Quotes eld
\end_inset
out there
\begin_inset Quotes erd
\end_inset
in the world.
Do leaves optimally rotate to consume sunlight? What does it mean to say
the coin behaves
\begin_inset Quotes eld
\end_inset
as if
\begin_inset Quotes erd
\end_inset
it follows a probability model? I'm not sure if coins, trees, or social
events can do math; I'm inclined to think they do not.
Nevertheless, I also think that the patterns observed in nature can be
described by mathematical patterns, patterns that we socially construct
and exchange among ourselves in mathematical language.
And as we interact with nature, we try to improve our mental models by
adjusting them.
\end_layout
\begin_layout Standard
A parameter is a numerical coefficient in a formula.
It exists in our minds.
We may project that parameter onto nature, saying things like
\begin_inset Quotes eld
\end_inset
the data is produced by a process with parameters
\begin_inset Formula $\alpha$
\end_inset
and
\begin_inset Formula $\beta$
\end_inset
.
\begin_inset Quotes erd
\end_inset
That kind of statement is common.
It is certainly more fun to say
\begin_inset Quotes eld
\end_inset
God generated this data with parameters
\begin_inset Formula $\gamma$
\end_inset
and
\begin_inset Formula $\phi$
\end_inset
.
\begin_inset Quotes erd
\end_inset
I'm afraid this way of talking does confuse my students, but I still do
it because it is fun, and somehow it exposes our mission.
We believe parameters determine the generation of data; we want to know
if our current understanding of those parameters is accurate.
\end_layout
\begin_layout Standard
The
\begin_inset Quotes eld
\end_inset
parameter estimates
\begin_inset Quotes erd
\end_inset
that are gathered from observation are not thought of as
\begin_inset Quotes eld
\end_inset
exact matches
\begin_inset Quotes erd
\end_inset
for the parameters in the models in our minds.
If they were exact, science would be simpler, and possibly less interesting.
Any experiment or set of observations will lead to a statement about the
parameters that are most likely to correspond with the data.
\end_layout
\begin_layout Section
Some Distributions
\end_layout
\begin_layout Standard
If any functionor just about any functioncan serve as the basis for
the creation of a probability density function, how can we bring our research
problem under control? Are we supposed to study just any old function that
pops into our heads?
\end_layout
\begin_layout Standard
I'm sorry to say the answer appears to be
\begin_inset Quotes eld
\end_inset
yes,
\begin_inset Quotes erd
\end_inset
or, perhaps
\begin_inset Quotes eld
\end_inset
maybe.
\begin_inset Quotes erd
\end_inset
On one hand, statisticians have already built a catalog of useful distributions.
We have pretty good reason to believe that there are 10 or 15
\begin_inset Quotes eld
\end_inset
really important
\begin_inset Quotes erd
\end_inset
distributions, functions that active researchers remember by name.
On the other hand, there is an almost indefinite variety of possible probabilit
y density functions.
In the late 1980s, I recall seeing a list of about 90 distributions that
were more or less understandably different.
In the early 2000s, the list had grown to 130 or so.
Virtually any distribution can be generalized, distorted, truncated, or
otherwise varied.
\end_layout
\begin_layout Standard
In the following list, I have collected the distributions that are most
important in the foundations for most students.
These are not comprehensive treatments, of course.
Those can be found in many other places.
My emphasis here is on understanding the
\begin_inset Quotes eld
\end_inset
shape
\begin_inset Quotes erd
\end_inset
of the various distributions, recognizing their parameters and the moments
of the distributions.
\end_layout
\begin_layout Standard
A checklist of the features that are worth mentioning for each distribution
should include
\end_layout
\begin_layout Itemize
domain.
For what values is the formula defined? Does it take values in the whole
real number line, or just for positive values, or perhaps just an interval
like
\begin_inset Formula $[0,1]$
\end_inset
?
\end_layout
\begin_layout Itemize
location.
Where is the
\begin_inset Quotes eld
\end_inset
center
\begin_inset Quotes erd
\end_inset
of the distribution (and is that center point substantively meaningful?).
\end_layout
\begin_layout Itemize
shape.
Is it unimodal? Is it symmetric about its center point?
\end_layout
\begin_layout Itemize
scale.
How widely
\begin_inset Quotes eld
\end_inset
spread out
\begin_inset Quotes erd
\end_inset
are the scores?
\end_layout
\begin_layout Subsection
Exponential Distribution
\end_layout
\begin_layout Standard
The exponential distribution is shaped like a
\begin_inset Quotes eld
\end_inset
ski slope,
\begin_inset Quotes erd
\end_inset
as illustrated in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "cap:Exponentialrate1"
\end_inset
.
It represents the time that one must wait before an
\begin_inset Quotes eld
\end_inset
event
\begin_inset Quotes erd
\end_inset
occurs if the chance of an event depends only on the amount of time that
passes.
\begin_inset Quotes eld
\end_inset
Delta t
\begin_inset Quotes erd
\end_inset
is the amount of time that passes,
\begin_inset Formula $\Delta t=t_{2}t_{1}$
\end_inset
.
If the probability of an
\begin_inset Quotes eld
\end_inset
event
\begin_inset Quotes erd
\end_inset
is
\begin_inset Formula $\lambda\cdot\Delta t$
\end_inset
(for
\begin_inset Formula $\Delta t$
\end_inset
shrinking to
\begin_inset Formula $0$
\end_inset
), then the time waited before an event is exponentially distributed.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
rate < 1
\end_layout
\begin_layout Plain Layout
upper < 10
\end_layout
\begin_layout Plain Layout
xvals < seq(0,upper,by=0.02)
\end_layout
\begin_layout Plain Layout
yvals1 < dexp(xvals, rate=rate)
\end_layout
\begin_layout Plain Layout
plot (xvals, yvals1, type="l", main="", xlab="x",ylab="probability")
\end_layout
\begin_layout Plain Layout
text(.7*max(xvals), .7*max(yvals1), label=bquote(f(x)==exp(x)))
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
placement H
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/ta10}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Exponential Density
\begin_inset CommandInset label
LatexCommand label
name "cap:Exponentialrate1"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Probability Density Function
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
f(x;\lambda)=\lambda e^{(\lambda x)},\, where\, x\geq0\label{eq:Exponential1}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
This is a very simple formula, with only one parameter,
\begin_inset Formula $\lambda$
\end_inset
, which is called the
\begin_inset Quotes eld
\end_inset
rate
\begin_inset Quotes erd
\end_inset
parameter.
In some texts, the parameter is expressed as the reciprocal of
\begin_inset Formula $\lambda,$
\end_inset
so the density would be
\begin_inset Formula
\begin{equation}
f(x;\mu)=\frac{1}{\mu}e^{x/\mu}
\end{equation}
\end_inset
The letter
\begin_inset Formula $e$
\end_inset
stands for Euler's (pronounced
\begin_inset Quotes eld
\end_inset
oiler's
\begin_inset Quotes erd
\end_inset
) constant, roughly equal to
\begin_inset Formula $2.7182818\ldots$
\end_inset
.
The value
\begin_inset Formula $e^{x}$
\end_inset
is often represented as
\begin_inset Formula $exp(x)$
\end_inset
.
\end_layout
\begin_layout Standard
In some texts, the density function will be written with the parameter as
a subscript, as in
\begin_inset Formula $f_{\lambda}(x)$
\end_inset
.
That works well, except when there are several parameters.
Sometimes it is simply written as
\begin_inset Formula $f(x)$
\end_inset
and the parameters are implicit.
I prefer to include the parameters in parentheses after a semicolon, mainly
because they are printed in a more readable size.
\end_layout
\begin_layout Standard
Consider the exponential density when
\begin_inset Formula $\lambda=1$
\end_inset
,
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
f(x;\lambda=1)=e^{x}.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
All of these notations represent that same function:
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
exp(x)=\frac{1}{exp(x)}=\frac{1}{e^{x}}
\]
\end_inset
\begin_inset Newline newline
\end_inset
The value of
\begin_inset Formula $exp(x)$
\end_inset
shrinks 0 smoothly as
\begin_inset Formula $x$
\end_inset
grows to infinity (see Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "cap:Exponentialrate1"
\end_inset
).
\end_layout
\begin_layout Standard
If
\begin_inset Formula $\lambda$
\end_inset
is very small, the decline in the value of
\begin_inset Formula $f(x;\lambda)$
\end_inset
is very gradual.
The rates of decline are displayed in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "cap:Exponentialrate2"
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Branch R
status collapsed
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
\end_layout
\begin_layout Plain Layout
rate < 2
\end_layout
\begin_layout Plain Layout
upper < 10
\end_layout
\begin_layout Plain Layout
xvals < seq(0,upper,by=0.02)
\end_layout
\begin_layout Plain Layout
yvals1 < dexp(xvals, rate=rate)
\end_layout
\begin_layout Plain Layout
plot (xvals, yvals1, type="l", xlab="x",ylab="probability")
\end_layout
\begin_layout Plain Layout
\end_layout
\begin_layout Plain Layout
rate < 1
\end_layout
\begin_layout Plain Layout
upper < 10
\end_layout
\begin_layout Plain Layout
xvals < seq(0,upper,by=0.02)
\end_layout
\begin_layout Plain Layout
yvals1 < dexp(xvals, rate=rate)
\end_layout
\begin_layout Plain Layout
lines(xvals, yvals1, lty=2)
\end_layout
\begin_layout Plain Layout
\end_layout
\begin_layout Plain Layout
rate < 0.2
\end_layout
\begin_layout Plain Layout
upper < 10
\end_layout
\begin_layout Plain Layout
xvals < seq(0,upper,by=0.02)
\end_layout
\begin_layout Plain Layout
yvals1 < dexp(xvals, rate=rate)
\end_layout
\begin_layout Plain Layout
lines(xvals, yvals1, lty=3)
\end_layout
\begin_layout Plain Layout
\end_layout
\begin_layout Plain Layout
legend("topright", legend=as.expression(c(bquote(lambda == 2.0), bquote(lambda
== 1.0), bquote(lambda == 0.2))), lty=c(1,2,3))
\end_layout
\begin_layout Plain Layout
@
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
placement H
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Three Exponential Densities
\begin_inset CommandInset label
LatexCommand label
name "cap:Exponentialrate2"
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
begin{center}
\end_layout
\begin_layout Plain Layout
\backslash
includegraphics{plots/ta11}
\end_layout
\begin_layout Plain Layout
\backslash
end{center}
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Cumulative Distribution Function
\end_layout
\begin_layout Standard
The cumulative distribution, the probability that a randomly drawn value
will be smaller than
\begin_inset Formula $k$
\end_inset
, is a very workable problem for a student who has completed elementary
calculus.
\begin_inset Formula
\[
F(k;\lambda)=\int_{0}^{k}\lambda e^{\lambda x}
\]
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
=e^{\lambda x}\mid_{0}^{k}
\]
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
=1e^{\lambda k}
\end{equation}
\end_inset
\end_layout
\begin_layout Subsubsection
Moments
\end_layout
\begin_layout Standard
First, begin with the result.
The expected value of
\begin_inset Formula $x$
\end_inset
for an exponential distribution is
\begin_inset Formula
\begin{equation}
E[x]=\frac{1}{\lambda}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
This is not difficult to derive.
Begin with the definition in expression (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:EV"
\end_inset
) and insert the exponential:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
E[x]=\int_{0}^{\infty}f(x)\cdot x\, dx=\int_{0}^{\infty}\lambda e^{\lambda x}x\, dx.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
This can be calculated with integration by parts.
\begin_inset Formula
\begin{eqnarray*}
& = & xe^{\lambda x}\mid_{0}^{\infty}+\int e^{\lambda x}\, dx\\
& = & 0\,\,0\,+\left(\frac{1}{\lambda}e^{\lambda x}\right)\mid_{0}^{\infty}\\
& = & lim_{x\rightarrow\infty}\frac{1}{\lambda}e^{\lambda x}+\frac{1}{\lambda}e^{\lambda0}\\
& = & 0+\frac{1}{\lambda}
\end{eqnarray*}
\end_inset
\end_layout
\begin_layout Standard
The variance of the exponential distribution is
\begin_inset Formula
\begin{equation}
Var[x]=\frac{1}{\lambda^{2}}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
The easiest way to demonstrate that with elementary tools is by remembering
that
\begin_inset Formula
\begin{equation}
Var[x]=E[x^{2}]E[x]^{2}
\end{equation}
\end_inset
We have already derived
\begin_inset Formula $E[x]$
\end_inset
, so we just need to solve for
\begin_inset Formula
\begin{equation}
E[x^{2}]=\int_{0}^{\infty}\lambda e^{\lambda x}x^{2}\, dx.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
The work requires integration by parts, twice.
When that is finished, we find
\begin_inset Formula
\[
E[x^{2}]=\frac{2}{\lambda^{2}}
\]
\end_inset
and so
\begin_inset Formula
\[
Var[x]=\frac{2}{\lambda^{2}}\left(\frac{1}{\lambda}\right)^{2}=\frac{1}{\lambda^{2}}
\]
\end_inset
\begin_inset Newline newline
\end_inset
I've written this out because it is important for prospective researchers
to understand that results about distributions must be derived
\emph on
by someone
\emph default
before they can be put to use.
The characterization of a distribution can sometimes be a difficult problem
that will require tools from mathematical statistics.
\end_layout
\begin_layout Subsubsection
Comments
\end_layout
\begin_layout Standard
This is sometimes used to describe waiting times for events that are likely
to happen quickly.
For example, if we ask,
\begin_inset Quotes eld
\end_inset
how long will we wait to hear a dial tone if we pick up a telephone,
\begin_inset Quotes erd
\end_inset
the answer (as of 2011, at least) is usually
\begin_inset Quotes eld
\end_inset
no time at all.
\begin_inset Quotes erd
\end_inset
However, sometimes there is a period of silence before the dial tone appears.
\end_layout
\begin_layout Standard
This distribution is simple and very workable.
Most applied research will be based on one of its more complicated relatives
that is describe in the following sections, but one should not move past
the exponential too quickly.
It is the cornerstone of the
\begin_inset Quotes eld
\end_inset
exponential family
\begin_inset Quotes erd
\end_inset
of distributions, the family upon which the
\begin_inset Quotes eld
\end_inset
generalized linear model
\begin_inset Quotes erd
\end_inset
is based.
\end_layout
\begin_layout Subsection
Normal Distribution
\end_layout
\begin_layout Standard
This is the single most important distribution in statistics.
It is unimodal and symmetric on the real number line.
It may represent observed variables like IQ scores, while it also plays
a vital role in the study of sampling distributions.
When we draw samples over and over and calculate estimates from them, those
estimates are likely to be normally distributed (this result is known as
the Central Limit Theorem).
\end_layout
\begin_layout Standard
The starting point of the normal is the apparently simple function,
\begin_inset Formula $exp(x^{2}$
\end_inset
), which is illustrated in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:NormalKernel"
\end_inset
.
It appears that we might simply have wondered out loud,
\begin_inset Quotes eld
\end_inset
what happens if we square the input in an exponential distribution?
\begin_inset Quotes erd
\end_inset
As we shall see below, the distribution may be moved to the left and right,
or it may be stretched or squeezed, but the essence of it is simply
\begin_inset Formula $exp(x^{2})$
\end_inset
.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < seq(from=3,to=3,by=0.1)
\end_layout
\begin_layout Plain Layout
y < exp(0.5*x^2)
\end_layout
\begin_layout Plain Layout
plot(x, y, type="l",xlab="x",ylab="",main="")
\end_layout
\begin_layout Plain Layout
text( 2, 0.75*max(y), label=expression(exp(x^2)))
\end_layout
\begin_layout Plain Layout
abline(v=0,lty=4)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/ta20}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
The Simplified Kernel of a Normal Distribution
\begin_inset CommandInset label
LatexCommand label
name "fig:NormalKernel"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
Changes in
\begin_inset Formula $\mu$
\end_inset
and
\begin_inset Formula $\sigma^{2}$
\end_inset
have rather superficial effects on the distribution.
They shift and scale it, nothing more interesting.
As a result, it is often common to standardize a random variable by calculating
it as a
\begin_inset Formula $Z$
\end_inset
statistic.
\begin_inset Formula
\begin{equation}
Z_{i}=\frac{x_{i}\mu}{\sigma}\label{eq:Zstatistic}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
This new variable,
\begin_inset Formula $Z_{i}$
\end_inset
, follows the standardized normal distribution,
\begin_inset Formula $N(0,1$
\end_inset
).
It is important to note that we do not lose any information by converting
\begin_inset Formula $x_{i}$
\end_inset
into
\begin_inset Formula $Z_{i}$
\end_inset
.
The original variable
\begin_inset Formula $x_{i}$
\end_inset
can be recovered from
\begin_inset Formula $Z_{i}$
\end_inset
by multiplication, as in
\begin_inset Formula
\begin{equation}
x_{i}=\mu+Z_{i}\sigma.\label{eq:XfromZ}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
This offers hints about a way with which to generate random samples from
\begin_inset Formula $N(\mu,\sigma^{2})$
\end_inset
.
Many computer programs have builtin generators for standardized random
normal variables.
We can draw observations from
\begin_inset Formula $N(0,1)$
\end_inset
and then rescale, as in expression (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:XfromZ"
\end_inset
), to obtain draws from
\begin_inset Formula $N(\mu,\sigma^{2}).$
\end_inset
\end_layout
\begin_layout Subsubsection
Probability Density Function
\end_layout
\begin_layout Standard
The normal distribution, which I refer to as
\begin_inset Formula $N(\mu,\sigma^{2})$
\end_inset
, describes a continuous variable that takes on values in the real number
line.
The two parameters,
\begin_inset Formula $\mu$
\end_inset
and
\begin_inset Formula $\sigma^{2}$
\end_inset
, determine the location and scale of the distribution.
The probability density function is
\begin_inset Formula
\begin{equation}
f(x;\mu,\sigma^{2})=\frac{1}{\sqrt{2\pi\sigma^{2}}}\, e^{\left(\frac{(x\mu)^{2}}{2\sigma^{2}}\right)},\,\infty>=
\end_layout
\begin_layout Plain Layout
mu < c(3,5, 6)
\end_layout
\begin_layout Plain Layout
sigma < 5
\end_layout
\begin_layout Plain Layout
x < seq(from=mu[1]3*sigma,to=mu[1]+3*sigma,by=0.2)
\end_layout
\begin_layout Plain Layout
y1 < dnorm(x, mean=mu[1], sd=sigma, log=F)
\end_layout
\begin_layout Plain Layout
plot(x, y1, type="l", main="", xlab="x",ylab="probability of x", xlim=c(20,20),
ylim=c(0,.12))
\end_layout
\begin_layout Plain Layout
x2 < seq(from=mu[2]3*sigma,to=mu[2]+3*sigma,by=0.2)
\end_layout
\begin_layout Plain Layout
y2 < dnorm(x2, mean=mu[2], sd=sigma, log=F)
\end_layout
\begin_layout Plain Layout
lines(x2,y2, lty=2)
\end_layout
\begin_layout Plain Layout
x3 < seq(from=mu[3]3*sigma,to=mu[3]+3*sigma,by=0.2)
\end_layout
\begin_layout Plain Layout
y3 < dnorm(x3, mean=mu[3], sd=sigma, log=F)
\end_layout
\begin_layout Plain Layout
lines(x3,y3, lty=3)
\end_layout
\begin_layout Plain Layout
abline(v = c(mu[1], mu[2],mu[3]), lty = c(1,2,3), lwd = 0.3, col = "gray70")
\end_layout
\begin_layout Plain Layout
legend("topright", legend = as.expression(c(bquote(mu == .(mu[1])), bquote(mu
== .(mu[2])), bquote(mu == .(mu[3])))), lty=c(1,2,3) )
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tn10}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Three Normal Distributions
\begin_inset CommandInset label
LatexCommand label
name "fig:Normalmushift"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
On the other hand, adjusting the
\begin_inset Formula $\sigma^{2}$
\end_inset
parameter changes the scale of the distribution.
If
\begin_inset Formula $\sigma^{2}$
\end_inset
is very small, then points are tightly clustered around
\begin_inset Formula $\mu$
\end_inset
.
Of course, it is possible to adjust both the location (
\begin_inset Formula $\mu$
\end_inset
) and scale (
\begin_inset Formula $\sigma^{2}$
\end_inset
) at the same time.
That is illustrated in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Compare2Normal"
\end_inset
.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status collapsed
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
m1 = 10
\end_layout
\begin_layout Plain Layout
sd1 = 20
\end_layout
\begin_layout Plain Layout
x < seq(m1  3 * sd1, m1 + 3 * sd1, length = 200)
\end_layout
\begin_layout Plain Layout
prob1 < dnorm(x, m = m1, sd = sd1)
\end_layout
\begin_layout Plain Layout
plot(x, prob1, ylab = "Probability Density", main = "",
\end_layout
\begin_layout Plain Layout
type = "l", ylim = c(0, max(prob1) * 1.3))
\end_layout
\begin_layout Plain Layout
m2 = 4
\end_layout
\begin_layout Plain Layout
sd2 = 15
\end_layout
\begin_layout Plain Layout
prob2 < dnorm(x, m = m2, sd = sd2)
\end_layout
\begin_layout Plain Layout
lines(x, prob2, lty = 2)
\end_layout
\begin_layout Plain Layout
legend("topright", legend = c(paste("mu=", m1,
\end_layout
\begin_layout Plain Layout
"sigma=", sd1), paste("mu=", m2, "sigma=",
\end_layout
\begin_layout Plain Layout
sd2)), lty = 1:2)
\end_layout
\begin_layout Plain Layout
abline(h = seq(0, max(prob1), length.out = 5),
\end_layout
\begin_layout Plain Layout
lty = 5, lwd = 0.3, col = "gray70")
\end_layout
\begin_layout Plain Layout
abline(v = c(m1, m2), lty = 5, lwd = 0.3, col = "gray70")
\end_layout
\begin_layout Plain Layout
@
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tn20}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Compare 2 Normal Distributions
\begin_inset CommandInset label
LatexCommand label
name "fig:Compare2Normal"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Cumulative Distribution Function
\end_layout
\begin_layout Standard
One of the truly frustrating facts in statistics is that the CDF of the
normal cannot be simplified into an easily calculated formula.
The chance of an outcome smaller than
\begin_inset Formula $k$
\end_inset
cannot be written down more simply than the CDF itself,
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
F(k;\mu,\sigma^{2})=\frac{1}{\sqrt{2\pi\sigma^{2}}}\int_{0}^{k}e^{\frac{1}{2\sigma^{2}}(x\mu)^{2}}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Hence, when we need to make calculations of
\begin_inset Formula $F(k;\mu,\sigma)$
\end_inset
, it is necessarily to perform numerical integration.
That is a difficult prospect.
It was done in the days before computers by teams of calculating assistants
who prepared large tables of solutions that were published in the appendices
of most statistics texts.
I was stunned to notice recently that the statistics text with which I
have been teaching no longer includes those tables, presumably because
they are
\begin_inset Quotes eld
\end_inset
in the computer.
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Subsubsection
Moments
\end_layout
\begin_layout Standard
Since the normal is unimodal and symmetric, it should come as no surprise
that the expected value of
\begin_inset Formula $x$
\end_inset
,
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{eqnarray}
E[x] & = & \int_{\infty}^{\infty}f(x;\mu,\sigma^{2})\cdot x\, dx\\
& = & \int_{\infty}^{\infty}\frac{1}{\sqrt{2\pi\sigma^{2}}}\, e^{\left(\frac{(x\mu)^{2}}{2\sigma^{2}}\right)}\cdot x\, dx\nonumber
\end{eqnarray}
\end_inset
is simply the center point of the distribution,
\begin_inset Formula $\mu$
\end_inset
.
That is,
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
E[x]=\mu
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
The
\series bold
variance
\series default
of a distribution is the
\begin_inset Quotes eld
\end_inset
probability weighted sum
\begin_inset Quotes erd
\end_inset
of the squared differences between outcomes and their expected values.
It is a little bit harder to believe that this complicated expression
\begin_inset Formula
\begin{eqnarray}
Var[x] & = & \int_{\infty}^{\infty}f(x;\mu,\sigma^{2})\cdot\left(xE[x]\right)^{2}\, dx\\
& & =\int_{\infty}^{\infty}\frac{1}{\sqrt{2\pi\sigma^{2}}}\, e^{\left(\frac{(x\mu)^{2}}{2\sigma^{2}}\right)}\cdot\left(xE[x]\right)^{2}\, dx\nonumber
\end{eqnarray}
\end_inset
simplifies to this:
\begin_inset Formula
\begin{equation}
Var[x]=\sigma^{2}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
This claim can be derived with a tool called a
\begin_inset Quotes eld
\end_inset
moment generating function
\begin_inset Quotes erd
\end_inset
that is presented in the first part of a course on mathematical statistics.
\end_layout
\begin_layout Standard
The main point here is that the normal distribution's expected value and
variance are extremely simple results.
The parameter
\begin_inset Formula $\mu$
\end_inset
happens to be the expected value.
It is also the mode and the median.
The variance happens to be the parameter
\begin_inset Formula $\sigma^{2}$
\end_inset
.
This is not usually true that a distribution's parameters end up being
equal to its expected value and variance.
In some ways, we are spoiled by the normal distribution.
\end_layout
\begin_layout Subsubsection
Comments
\end_layout
\begin_layout Standard
The normal distribution has many interesting qualities.
Although in some ways it is a complicated function, in some ways it is
very easy to work with.
No doubt, on the difficult side of the ledger, we have the problem that
the cumulative distribution function has no workable analytic solution.
The only way calculate the chances of an outcome between two points is
by numerical approximation.
That makes computer programs run more slowly and, unless we are very careful,
less accurately than they should.
\end_layout
\begin_layout Standard
On the easy side of the ledger, however, it is not too hard to calculate
joint probabilities.
Consider, for example, the chance that 2 independent observations from
\begin_inset Formula $N(\mu,\sigma^{2})$
\end_inset
will be equal to particular values
\begin_inset Formula $x_{1}$
\end_inset
and
\begin_inset Formula $x_{2}$
\end_inset
.
That will be the product of the two densities,
\begin_inset Formula
\begin{equation}
f(x_{1};\mu,\sigma^{2})\cdot f(x_{2};\mu,\sigma^{2})=\frac{1}{\sqrt{2\pi\sigma^{2}}}\, e^{\frac{1}{2\sigma^{2}}(x_{1}\mu)^{2}}\times\frac{1}{\sqrt{2\pi\sigma^{2}}}\, e^{\frac{1}{2\sigma^{2}}(x_{2}\mu)^{2}}.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
We can group like terms and use the laws of exponents to write this as
\begin_inset Formula
\begin{equation}
\frac{1}{\sqrt{2\pi\sigma^{2}}}\, e^{\frac{1}{2\sigma^{2}}\{(x_{1}\mu)^{2}+(x_{2}\mu)^{2}\}}.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
This generalizes easily if we need to know the chance that
\begin_inset Formula $N$
\end_inset
separate observations will be
\begin_inset Formula $x_{1},$
\end_inset
\begin_inset Formula $x_{2},\ldots,x_{N}$
\end_inset
,
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\frac{1}{\sqrt{2\pi\sigma^{2}}}\, e^{\frac{1}{2\sigma^{2}}\{\sum_{i=1}^{N}(x_{i}\mu)^{2}\}}\label{eq:NormProduct}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
Why is this useful? In maximum likelihood analysis, we frequently need to
calculate optimal estimates for
\begin_inset Formula $\mu$
\end_inset
and
\begin_inset Formula $\sigma^{2}$
\end_inset
.
To my eye, it is quite obvious that maximizing this expression is the same
as minimizing the sum of squares in the numerator.
If that is not obvious to you, remember that
\end_layout
\begin_layout Itemize
a parameter estimate that maximizes a function also maximizes a monotone
transformation of that function, and
\end_layout
\begin_layout Itemize
maximizing a function is the same as minimizing its negative.
\end_layout
\begin_layout Standard
The logarithm is a monotone transform.
Take the natural log of (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:NormProduct"
\end_inset
),
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{eqnarray}
& & ln(\frac{1}{\sqrt{2\pi\sigma^{2}}})+ln\left[\, e^{\frac{1}{2\sigma^{2}}\{\sum_{i=1}^{N}(x_{i}\mu)^{2}\}}\right]\label{eq:NormProduct1}\\
& = & ln(\frac{1}{\sqrt{2\pi\sigma^{2}}})\frac{1}{2\sigma^{2}}\{\sum_{i=1}^{N}(x_{i}\mu)^{2}\}.\nonumber
\end{eqnarray}
\end_inset
\begin_inset Newline newline
\end_inset
It turns out that the maximum likelihood estimate of
\begin_inset Formula $\mu$
\end_inset
is the sample average:
\begin_inset Formula
\begin{equation}
\hat{\mu}=\frac{1}{N}\sum_{i=1}^{N}x_{i}.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
T
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\uuline off
\uwave off
\noun off
\color none
he maximum likelihood estimate of
\begin_inset Formula $\sigma^{2}$
\end_inset
is
\begin_inset Formula
\begin{equation}
\widehat{\sigma^{2}}=\frac{\sum_{i=1}^{N}(x_{i}\hat{\mu)}^{2}}{N}.
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
These are clean, workable formulas.
We did not run into any complicated
\begin_inset Quotes eld
\end_inset
numerical approximations
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
iterative algorithms.
\begin_inset Quotes erd
\end_inset
Many ML estimators require more difficult calculation, but it is worth
remembering that at least one of them is easy.
\end_layout
\begin_layout Standard
When calculating the variance from a sample, many students in introductory
statistics courses have been terrorized by the question,
\begin_inset Quotes eld
\end_inset
should we divide by
\begin_inset Formula $N$
\end_inset
or
\begin_inset Formula $N1$
\end_inset
?
\begin_inset Quotes erd
\end_inset
Professors usually respond by dissembling and ambiguating, or reciting
a poem about
\begin_inset Quotes eld
\end_inset
used degrees of freedom.
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Standard
But I'm here to give a straight answer.
If the student wants the ML estimator, the answer is
\begin_inset Quotes eld
\end_inset
divide by
\begin_inset Formula $N$
\end_inset
.
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Standard
On the other hand, the ML estimator may not be the one the professor wants.
The expected value of the ML estimator of the variance is not equal to
the true
\begin_inset Formula $Var[x]=\sigma^{2}$
\end_inset
.
That is, it can be shown that
\begin_inset Formula
\begin{equation}
E[\widehat{\sigma^{2}}]=\sigma^{2}\frac{\sigma^{2}}{N}=\frac{N1}{N}\sigma^{2}
\end{equation}
\end_inset
The ML estimator is just a bit too small.
It equals the
\begin_inset Quotes eld
\end_inset
true variance
\begin_inset Quotes erd
\end_inset
\begin_inset Formula $\sigma^{2}$
\end_inset
minus a fraction that depends on
\begin_inset Formula $N$
\end_inset
.
An unbiased formula for estimating the variance from a sample is
\begin_inset Formula
\begin{equation}
\widehat{\sigma^{2}}=\frac{\sum_{i=1}^{N}(x_{i}\hat{\mu})^{2}}{N1}.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
So, the complete answer to the student's question is a question that should
be phrased back to the professor.
\begin_inset Quotes eld
\end_inset
Would you like an ML estimate or would you like an unbiased estimate?
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Subsection
Uniform Distribution
\end_layout
\begin_layout Standard
The uniform distribution has already been discussed in Section
\begin_inset CommandInset ref
LatexCommand ref
reference "sub:CumulativeDistributionFunction"
\end_inset
and illustrated in Figures
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Uniform10"
\end_inset
through
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Uniform30"
\end_inset
.
It is presented here mainly for completeness.
\end_layout
\begin_layout Standard
The uniform is a continuous distribution that is defined between two points,
\begin_inset Formula $[a,b]$
\end_inset
.
All points between
\begin_inset Formula $a$
\end_inset
and
\begin_inset Formula $b$
\end_inset
are equally likely.
\end_layout
\begin_layout Subsubsection
Probability Density Function
\end_layout
\begin_layout Standard
The probability density of the uniform is
\begin_inset Quotes eld
\end_inset
flat
\begin_inset Quotes erd
\end_inset
.
It does not have parameters in the usual sense.
Its height is simply determined by its width.
If the range is
\begin_inset Formula $a=0$
\end_inset
to
\begin_inset Formula $b=1$
\end_inset
, then the height of the density has to be
\begin_inset Formula $1.0$
\end_inset
in order to guarantee that the total area under the curve is equal to
\begin_inset Formula $1.0.$
\end_inset
On the other hand, if the uniform has to stretch from
\begin_inset Formula $a=10$
\end_inset
to
\begin_inset Formula $b=+10$
\end_inset
, then the height of the curve must be
\begin_inset Formula $1/20=0.05$
\end_inset
.
\end_layout
\begin_layout Standard
If one grasps that simple fact, then it is easy to see the PDF is
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
f(x)=\frac{1}{ba}
\end{equation}
\end_inset
\end_layout
\begin_layout Subsubsection
Cumulative Distribution Function
\end_layout
\begin_layout Standard
The graph of the linkage between the PDF and the CDF has already been presented
in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Uniform30"
\end_inset
.
The formal representation of the chance of an outcome smaller than
\begin_inset Formula $k$
\end_inset
is
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{eqnarray}
F(k) & = & \int_{a}^{k}\frac{1}{ba}dx\nonumber \\
& = & \frac{1}{ba}(ka)
\end{eqnarray}
\end_inset
\end_layout
\begin_layout Subsubsection
Moments
\end_layout
\begin_layout Standard
The expected value is exactly in the center of the domain, half way between
\begin_inset Formula $a$
\end_inset
and
\begin_inset Formula $b$
\end_inset
.
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
E[x]=\frac{1}{2}(a+b)
\]
\end_inset
\begin_inset Newline newline
\end_inset
The variance depends on the width of the domain, of course.
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
Var[x]=\frac{1}{12}(ba)^{2}
\]
\end_inset
\end_layout
\begin_layout Subsubsection
Comments
\end_layout
\begin_layout Standard
The uniform represents the idea that any outcome is equally likely.
It is mainly useful as a theoretical representation of the idea that there
is no basis for predicting one value over another.
To a Bayesian statistician, a
\begin_inset Quotes eld
\end_inset
uniform prior
\begin_inset Quotes erd
\end_inset
is used to mean that a person is completely unsure about what to expect.
In many game theory models, the uniform distribution is used because it
is very easy to work with.
\end_layout
\begin_layout Subsection
Gamma Distribution
\end_layout
\begin_layout Standard
The gamma distribution is continuous and defined for positive real numbers,
\begin_inset Formula $[0,\infty)$
\end_inset
.
Depending on the values of its parameters, it may be either
\begin_inset Quotes eld
\end_inset
skislope
\begin_inset Quotes erd
\end_inset
shaped or it may be singlepeaked, with a moreorless exaggerated tail
on the right.
It can be used to represent the density of any variable that is restricted
to nonnegative values, and is frequently used in models of waiting times
and survival.
\end_layout
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
xvals < seq(0,10,length.out=1000)
\end_layout
\begin_layout Plain Layout
gam1 < dgamma(xvals, shape=1, scale=1)
\end_layout
\begin_layout Plain Layout
gam2 < dgamma(xvals, shape=2, scale= 1)
\end_layout
\begin_layout Plain Layout
plot(xvals, gam1, type="l", xlab="x",ylab="Gamma probability density",
ylim=c(0,1))
\end_layout
\begin_layout Plain Layout
lines(xvals, gam2, lty=2)
\end_layout
\begin_layout Plain Layout
text(.4, .7, "shape=1, scale=1", pos=4, col=1)
\end_layout
\begin_layout Plain Layout
text(3, .2, "shape=2, scale=1", pos=4, col=1)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tGamma1}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Gamma Density
\begin_inset CommandInset label
LatexCommand label
name "fig:GammaDistribution"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Probability Density Function
\end_layout
\begin_layout Standard
Like the beta and the normal, the gamma distribution is a two parameter
distribution.
The parameters are often called shape (
\begin_inset Formula $\alpha$
\end_inset
) and scale
\begin_inset Formula $(\beta)$
\end_inset
.
I will refer to it as
\begin_inset Formula $Gamma(\alpha,\beta)$
\end_inset
.
The PDF is
\begin_inset Formula
\begin{equation}
f(x)=\frac{1}{\Gamma(\alpha)\beta^{\alpha}}x^{\alpha1}e^{x/\beta},where\, x\geq0,\mbox{\alpha>0,\beta>0}.\label{eq:GammaPDF}
\end{equation}
\end_inset
The symbol
\begin_inset Formula $\Gamma(\alpha)$
\end_inset
is a normalizing constant.
It is known as the gamma function.
It can be thought of as an extension of the factorial function to the real
number line.
For integers,
\begin_inset Formula $\Gamma(\alpha)=(\alpha1)!$
\end_inset
\end_layout
\begin_layout Standard
How would someone ever think of a horrible formula like that? If I were
just making this up, I would reason as follows.
Start with the exponential distribution's PDF, in which the essential shape
is determined by
\begin_inset Formula $e^{x}$
\end_inset
.
That is a bit boring and inflexible; it is always smoothly declining from
left to right.
To spice that up a bit, multiply by
\begin_inset Formula $x^{\alpha1}$
\end_inset
.
\begin_inset Formula
\begin{equation}
x^{\alpha1}e^{x}\label{eq:GammaFn}
\end{equation}
\end_inset
If
\begin_inset Formula $\alpha=1$
\end_inset
, this just reproduces the exponential (since
\begin_inset Formula $x^{0}=1$
\end_inset
).
However, if
\begin_inset Formula $\alpha>1,$
\end_inset
the shape changes.
We have a singlepeaked function with a mode in the interior of the domain,
as illustrated in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Gamma2"
\end_inset
.
That is why
\begin_inset Formula $\alpha$
\end_inset
is called a
\begin_inset Quotes eld
\end_inset
shape
\begin_inset Quotes erd
\end_inset
parameter.
\end_layout
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < seq(0,10,length.out=1000)
\end_layout
\begin_layout Plain Layout
alpha < c(1,2.5, 5)
\end_layout
\begin_layout Plain Layout
y1 < x^(alpha[1]1)*exp(x)
\end_layout
\begin_layout Plain Layout
y2 < x^(alpha[2]1)*exp(x)
\end_layout
\begin_layout Plain Layout
y3 < x^(alpha[3]1)*exp(x)
\end_layout
\begin_layout Plain Layout
plot(x, y1, type="l", xlab="x", ylab=expression(paste(x^{alpha1}*e^{x})),
ylim=c(0,6))
\end_layout
\begin_layout Plain Layout
lines(x, y2, lty=2)
\end_layout
\begin_layout Plain Layout
lines(x, y3, lty=3)
\end_layout
\begin_layout Plain Layout
legend("topright",legend=c(expression(paste(alpha==1)),expression(paste(alpha==2.
5)),expression(paste(alpha==5))), lty=c(1,2,3))
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tGamma2}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Gamma Kernel (Unscaled)
\begin_inset CommandInset label
LatexCommand label
name "fig:Gamma2"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
Expression (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:GammaFn"
\end_inset
) is the basis for my new probability model, but the area under the curve
is not 1.0.
A normalizing constant is required.
Clearly, it must be
\begin_inset Formula
\begin{equation}
\Gamma(\alpha)=\int_{0}^{\infty}x^{\alpha1}e^{x}.\label{eq:GammaFn2}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
That function,
\begin_inset Formula $\Gamma(\alpha)$
\end_inset
, is called the gamma function.
If we divide (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:GammaFn"
\end_inset
) by (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:GammaFn2"
\end_inset
), we have a valid pdf.
We have almost finished the derivation of (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:GammaPDF"
\end_inset
), except that we have not yet introduced
\begin_inset Formula $\beta$
\end_inset
.
However, that is easily remedied.
Replace the variable
\begin_inset Formula $x$
\end_inset
by a new variable equal to
\begin_inset Formula $x/\beta$
\end_inset
(and employ a change of variables), and we find the result is exactly right.
\end_layout
\begin_layout Standard
The gamma distribution seems to
\begin_inset Quotes eld
\end_inset
pop up
\begin_inset Quotes erd
\end_inset
in many contexts.
Recall that the exponential distribution describes the time we have to
wait before an event occurs.
The gamma can be derived as a model for the amount of time we have to wait
that event to repeat itself several times.
It makes sense, then, that the gamma distribution with
\begin_inset Formula $\alpha=1$
\end_inset
, that is,
\begin_inset Formula $Gamma(1,\beta)$
\end_inset
, is identical to the exponential distribution (because we only waited for
the event one time).
\begin_inset Formula $Gamma(2,\beta)$
\end_inset
would represent the time we wait for two events, and so forth.
This interpretation can be used to derive the gamma's pdf, but we have
to be somewhat cautious about the interpretation.
The shape parameter
\begin_inset Formula $\alpha$
\end_inset
can take on any real values greater than 0, it is not limited to integers
like
\begin_inset Formula $1$
\end_inset
,
\begin_inset Formula $2$
\end_inset
, and so forth.
The interpretation of those noninteger
\begin_inset Formula $\alpha$
\end_inset
values is not facilitated by the
\begin_inset Quotes eld
\end_inset
waiting for events
\begin_inset Quotes erd
\end_inset
interpretation.
\end_layout
\begin_layout Subsubsection
Cumulative Distribution Function
\end_layout
\begin_layout Standard
The CDF represents the chance of a score lower than
\begin_inset Formula $k$
\end_inset
.
As one might expect from the functional form, this integral is difficult
to solve:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
F(k;\alpha,\beta)=\int_{0}^{k}\frac{1}{\Gamma(\alpha)\beta^{\alpha}}x^{\alpha1}e^{x/\beta}dx.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
We can take note of the fact that
\begin_inset Formula $\Gamma(\alpha)\beta^{\alpha}$
\end_inset
does not depend on
\begin_inset Formula $x$
\end_inset
to rewrite this as
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
F(k;\alpha,\beta)=\frac{1}{\Gamma(\alpha)\beta^{\alpha}\,}\,\int_{0}^{k}x^{\alpha1}e^{x/\beta}dx.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
As far as I know, it cannot be further simplified, which means that in order
to calculate
\begin_inset Formula $F(k;\alpha,\beta)$
\end_inset
, it is necessarily to numerically approximate the area under the curve
described by the integral.
\end_layout
\begin_layout Subsubsection
Moments
\end_layout
\begin_layout Standard
The expected value of the gamma is the product of its two parameters.
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
E[x_{i}]=\alpha\cdot\beta\label{eq:GammaEx}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
This indicates that one can shift the distribution to the right by increasing
either the shape or the scale parameter.
\end_layout
\begin_layout Standard
The variance of the gamma responds to both parameters as well, but it is
more sensitive to the scale parameter.
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
Var[x_{i}]=\alpha\cdot\beta^{2}\label{eq:GammaVar}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
As one should expect from the graph of the gamma distribution, the expected
value does not coincide with the mode.
If
\begin_inset Formula $\alpha>1$
\end_inset
, then the distribution has a singlepeaked appearance in which the most
likely outcome, the mode, is
\begin_inset Formula
\begin{equation}
mode=\beta(\alpha1)
\end{equation}
\end_inset
\end_layout
\begin_layout Subsubsection
Comments
\end_layout
\begin_layout Standard
The gamma distribution is important partly because it is flexible enough
to describe a variety of possible beliefs about the chances of outcomes
on the positive real numbers.
The gamma distribution was used to summarize my beliefs about the likely
number of breakouts by my neighbor's dog in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:DogEscapeEV"
\end_inset
.
\end_layout
\begin_layout Standard
Another important point is that the gamma distribution is centrally located
in the family of distributions.
Other distributions can seen as special cases of the gamma.
For example, the
\begin_inset Formula $\chi^{2}(\nu)$
\end_inset
distribution (which is discussed below) has the same PDF as
\begin_inset Formula $Gamma(\frac{\nu}{2},2)$
\end_inset
.
If
\begin_inset Formula $\alpha=1$
\end_inset
, the gamma simplifies into an exponential distribution (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:Exponential1"
\end_inset
).
\end_layout
\begin_layout Standard
The gamma distribution has special properties that are inherited by all
of its special cases.
One of the most intriguing facts about the gamma distribution is the additivity
property.
The sum of observations from gamma distributions with various
\begin_inset Formula $\alpha_{i}$
\end_inset
, but the same scale (
\begin_inset Formula $\beta$
\end_inset
), is distributed as
\begin_inset Formula $Gamma(\alpha_{1}+\ldots+\alpha_{n},\beta)$
\end_inset
.
\end_layout
\begin_layout Standard
The gamma's probability density function is easy to
\begin_inset Quotes eld
\end_inset
reparameterize
\begin_inset Quotes erd
\end_inset
for particular projects.
In particular, we can fiddle around with the scale parameter to achieve
various desired effects.
In regression models of
\begin_inset Quotes eld
\end_inset
count
\begin_inset Quotes erd
\end_inset
data, sometimes we need to add
\begin_inset Quotes eld
\end_inset
noise
\begin_inset Quotes erd
\end_inset
in the form of a positive random variable that has a fixed expected value
but an adjustable amount of variance.
For that purpose, set the gamma scale parameter to be
\begin_inset Formula $1/\alpha$
\end_inset
.
The expected value will be
\begin_inset Formula
\begin{equation}
E[x]=\alpha\cdot\frac{1}{\alpha}=1.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
As we adjust
\begin_inset Formula $\alpha$
\end_inset
, the expected value stays fixed at 1, but the variance is still sensitive.
\begin_inset Formula
\begin{equation}
Var[x]=\alpha(\frac{1}{\alpha})^{2}=\frac{1}{\alpha}.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Three probability density curves are plotted for the special case in which
\begin_inset Formula $\beta=1/\alpha$
\end_inset
in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Gamma20"
\end_inset
.
\end_layout
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
xvals < seq(0,10,length.out=1000)
\end_layout
\begin_layout Plain Layout
gam1 < dgamma(xvals, shape=2, scale=1/2)
\end_layout
\begin_layout Plain Layout
gam2 < dgamma(xvals, shape=10, scale= 1/10)
\end_layout
\begin_layout Plain Layout
gam3 < dgamma(xvals, shape=50, scale= 1/50)
\end_layout
\begin_layout Plain Layout
plot(xvals, gam1, type="l", xlab="x",ylab="Gamma probability density",
ylim=c(0,2))
\end_layout
\begin_layout Plain Layout
lines(xvals, gam2, lty=2)
\end_layout
\begin_layout Plain Layout
lines(xvals, gam3, lty=3)
\end_layout
\begin_layout Plain Layout
legend("topright", legend=c("alpha=2","alpha=10","alpha=50"),lty=c(1,2,3))
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tGamma20}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Gamma Density when
\begin_inset Formula $\beta=1/\alpha$
\end_inset
\begin_inset CommandInset label
LatexCommand label
name "fig:Gamma20"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
From sample data, the gamma's parameters can be estimated in a variety of
ways.
The maximum likelihood estimate can be derived by iterative approximation,
but there is no
\begin_inset Quotes eld
\end_inset
closed form
\begin_inset Quotes erd
\end_inset
solution from which to calculate estimates of
\begin_inset Formula $\alpha$
\end_inset
and
\begin_inset Formula $\beta$
\end_inset
.
The
\begin_inset Quotes eld
\end_inset
method of moments
\begin_inset Quotes erd
\end_inset
can be used to get a quick
\begin_inset Quotes eld
\end_inset
first take
\begin_inset Quotes erd
\end_inset
on parameter estimates.
The method proceeds as follows.
First, calculate the sample mean and variance.
Let's call these
\begin_inset Formula $\widehat{E[x]}$
\end_inset
and
\begin_inset Formula $\widehat{Var[x]}$
\end_inset
.
Second, use those values in place of
\begin_inset Formula $E[x]$
\end_inset
and
\begin_inset Formula $Var[x]$
\end_inset
in expressions (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:GammaEx"
\end_inset
) and (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:GammaVar"
\end_inset
).
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\widehat{E[x]}=\hat{\alpha}\cdot\hat{\beta}\label{eq:GammaEx1}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
and
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\widehat{Var[x]}=\hat{\alpha}\cdot\hat{\beta}^{2}\label{eq:GammaVar1}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
This is called the
\begin_inset Quotes eld
\end_inset
method of moments
\begin_inset Quotes erd
\end_inset
because we have proceeded as though the sample mean and variance are actually
equal to the theoretical moments.
Those expressions can be rearranged so that
\begin_inset Formula
\begin{equation}
\hat{\alpha}=\frac{\widehat{E[x]}^{2}}{\widehat{Var[x]}}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\hat{\beta}=\frac{\widehat{Var[x]}}{\widehat{E[x]}}
\end{equation}
\end_inset
\end_layout
\begin_layout Subsection
Beta Distribution
\end_layout
\begin_layout Standard
The beta distribution,
\begin_inset Formula $Beta(\alpha,\beta)$
\end_inset
, has two parameters which jointly define its shape.
Like the uniform distribution, the beta distribution is defined on a closed
interval.
For simplicity, we consider only the version that is defined on the domain
\begin_inset Formula $[0,1]$
\end_inset
.
\end_layout
\begin_layout Standard
The beta's parameters can be adjusted to dramatically change its appearance.
It can be single peaked, skewed, or two peaked.
Three example beta distributions are presented in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:BetaDistributions"
\end_inset
.
\end_layout
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < seq(0,1,by=.005)
\end_layout
\begin_layout Plain Layout
b1 < c(3, 0.7, 1.2)
\end_layout
\begin_layout Plain Layout
b2 < c(5.6, 0.58, 0.2)
\end_layout
\begin_layout Plain Layout
pbeta1 < dbeta(x, b1[1],b2[1])
\end_layout
\begin_layout Plain Layout
pbeta2 < dbeta(x, b1[2],b2[2])
\end_layout
\begin_layout Plain Layout
pbeta3 < dbeta(x, b1[3],b2[3])
\end_layout
\begin_layout Plain Layout
plot(x, pbeta1, type="n", xlab="x",ylab="Probability Density",ylim=c(0,4))
\end_layout
\begin_layout Plain Layout
lines(x,pbeta1, lty=1)
\end_layout
\begin_layout Plain Layout
lines(x,pbeta2, lty=2)
\end_layout
\begin_layout Plain Layout
lines(x,pbeta3, lty=3)
\end_layout
\begin_layout Plain Layout
legend(0.55, 3.5, legend=c("Beta(3,5.6)","Beta(0.7, 0.58)","Beta(1.2,0.2)"),lty=1:3)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tBeta10}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Beta Density Functions
\begin_inset CommandInset label
LatexCommand label
name "fig:BetaDistributions"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Probability Density Function
\end_layout
\begin_layout Standard
The standard
\begin_inset Formula $Beta$
\end_inset
's pdf is defined on
\begin_inset Formula $[0,1]$
\end_inset
:
\begin_inset Formula
\begin{equation}
f(x;\alpha,\beta)=\frac{1}{B(\alpha,\beta)}x^{\alpha1}(1x)^{\beta1},\, where\, x\in[0,1]\label{eq:BetaDensity}
\end{equation}
\end_inset
and the normalizing constant is called the beta function
\begin_inset Formula
\begin{equation}
B(\alpha,\beta)=\int_{0}^{1}t^{\alpha1}(1t)^{\beta1}dt
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Incidentally, the beta function is equal to a ratio of gamma functions,
\begin_inset Formula
\begin{equation}
B(\alpha,\beta)=\frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
and the fraction formed by two gamma variables that have the same scale
parameter,
\begin_inset Formula $x_{1}/(x_{1}+x_{2})$
\end_inset
, is distributed as a beta variable.
\end_layout
\begin_layout Subsubsection
Cumulative Distribution Function
\end_layout
\begin_layout Standard
The chance that a draw from a beta density is less than
\begin_inset Formula $k$
\end_inset
is
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
F(k;\alpha,\beta)=\frac{1}{B(\alpha,\beta)}\int_{0}^{k}x^{\alpha1}(1x)^{\beta1}dx\label{eq:BetaCDF}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Sometimes the part on the right (the integral) is called the incomplete
beta function, but as far as I can see, no simplifying analytical benefit
is had by relabeling it.
Generally, it can only be calculated by numerical integration.
\end_layout
\begin_layout Subsubsection
Moments
\end_layout
\begin_layout Standard
The expected value of a variable that is beta distributed is:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
E[x]=\mu=\frac{\alpha}{\alpha+\beta}\label{eq:BetaMean}
\end{equation}
\end_inset
and the variance is
\begin_inset Formula
\begin{equation}
Var[x]=\frac{\alpha\beta}{(\alpha+\beta)^{2}(\alpha+\beta+1)}\label{eq:BetaVariance}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
If
\begin_inset Formula $\alpha>1$
\end_inset
and
\begin_inset Formula $\beta>1$
\end_inset
, the peak of the density is in the interior of [0,1].
In that case, the mode of the
\begin_inset Formula $Beta$
\end_inset
distribution is
\begin_inset Formula
\begin{equation}
mode=\gamma=\frac{\alpha1}{\alpha+\beta2}\label{eq:BetaMode}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
If
\begin_inset Formula $\alpha$
\end_inset
or
\begin_inset Formula $\beta<1$
\end_inset
, the mode may be at an edge.
\end_layout
\begin_layout Standard
If
\begin_inset Formula $\alpha=\beta=1$
\end_inset
, then the beta is identical to a uniform distribution.
\end_layout
\begin_layout Subsubsection
Comments
\end_layout
\begin_layout Standard
The two parameters that determine the shape and nature of a beta distribution
are not so intuitively meaningful as are the parameters for the normal.
\end_layout
\begin_layout Standard
The beta distribution fits into a larger mosaic of probability models, but
in my experience it has two especially important uses.
First, it can summarize our beliefs about how likely something is to
\begin_inset Quotes eld
\end_inset
be true
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
to occur.
\begin_inset Quotes erd
\end_inset
Since the beta's formula is so flexible, it can describe virtually any
shape that we might subjectively impose in a model.
\end_layout
\begin_layout Standard
Second, the beta can be used as an output variable in a regression modeling
exercise.
If a dependent variable is a proportion, then it may naturally be interpreted
as a draw from a beta distribution.
Predictors are used in an effort to account for the fact that the observed
proportion is high for some units and low for others.
\end_layout
\begin_layout Subsection
ChiSquared
\end_layout
\begin_layout Standard
The ChiSquared distribution depends on only one parameter, which is I will
refer to by the Greek letter
\begin_inset Formula $\nu$
\end_inset
(pronounced
\begin_inset Quotes eld
\end_inset
nu
\begin_inset Quotes erd
\end_inset
).
The ChiSquared may be referred to as
\begin_inset Formula $\chi^{2}(\nu)$
\end_inset
or
\begin_inset Formula $\chi_{\nu}^{2}$
\end_inset
.
The
\begin_inset Formula $\chi^{2}$
\end_inset
represents the probability of a variable that is defined on the interval
\begin_inset Formula $[0,\infty)$
\end_inset
.
\end_layout
\begin_layout Standard
The ChiSquared distribution is used to describe the
\begin_inset Quotes eld
\end_inset
sum of squared mistakes
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
mismatches
\begin_inset Quotes erd
\end_inset
between expectation and observation.
If that sum is very smallclose to 0it means the cumulative mismatch
between expectations inspired by a model is small.
In all kinds of regression modeling, we often need to decide if one model
is
\begin_inset Quotes eld
\end_inset
closer
\begin_inset Quotes erd
\end_inset
to the data than another.
The difference between the models usually boils down to a ChiSquared statistic.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
xvals < seq(0,30,length.out=1000)
\end_layout
\begin_layout Plain Layout
chisquare1 < dchisq(xvals, df=5)
\end_layout
\begin_layout Plain Layout
chisquare2 < dchisq(xvals, df=10)
\end_layout
\begin_layout Plain Layout
chisquare3 < dchisq(xvals, df=20)
\end_layout
\begin_layout Plain Layout
plot(xvals, chisquare1, type="l", xlab=expression(chi^2), ylab="probability
density", ylim=c(0,0.4), main="")
\end_layout
\begin_layout Plain Layout
lines(xvals, chisquare2, lty=2)
\end_layout
\begin_layout Plain Layout
lines(xvals, chisquare3, lty=3)
\end_layout
\begin_layout Plain Layout
legend("topright",legend=(c(expression(nu==5), expression(nu==10),expression(nu=
=20))),lty=1:3)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tChiSquare1}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Density Function of
\begin_inset Formula $\chi^{2}$
\end_inset
\begin_inset CommandInset label
LatexCommand label
name "fig:ChiSquare1"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
As one can see, as
\begin_inset Formula $\nu$
\end_inset
grows larger, the concentration of density shifts to the right and becomes
more symmetric.
\end_layout
\begin_layout Subsubsection
Probability Density Function
\end_layout
\begin_layout Standard
In the discussion of the gamma distribution, it was already mentioned that
the pdf of a Chisquare distribution is identical to a gamma distribution
with shape parameter
\begin_inset Formula $\nu/2$
\end_inset
and scale
\begin_inset Formula $2$
\end_inset
.
Filling in the blanks in expression (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:GammaPDF"
\end_inset
), the Chisquare's probability density function is
\begin_inset Formula
\begin{equation}
f(x)=\frac{1}{\Gamma(\frac{\nu}{2})(2)^{\frac{\nu}{2}}}x^{\frac{\nu}{2}1}e^{x/2},\, x\geq0,\,\nu>0.
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
This seems somewhat anticlimactic, but the story does not end there.
The best is yet to come.
Here it is in a nutshell: sum of squared random variables follows a Chisquare
distribution.
\end_layout
\begin_layout Standard
Here is the big idea.
Draw a collection of
\begin_inset Formula $\nu$
\end_inset
observations from a standard normal distribution,
\begin_inset Formula
\begin{equation}
Z_{i}\sim N(0,1),\, for\, i=1,2,\ldots,\nu.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Square each one, and add them together.
The result is distributed as a
\begin_inset Formula $\chi^{2}(\nu)$
\end_inset
.
That is to say
\begin_inset Formula
\begin{equation}
Z_{1}^{2}+\ldots+Z_{\nu}^{2}\sim\chi^{2}(v).
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
When it is used in this context, the parameter that represents sample size,
\begin_inset Formula $\nu$
\end_inset
, is often called
\begin_inset Quotes eld
\end_inset
degrees of freedom.
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Subsubsection
Cumulative Distribution Function
\end_layout
\begin_layout Standard
The cumulative distribution of a
\begin_inset Formula $\chi^{2}(\nu)$
\end_inset
variable does not reduce to a simple formula, in the same way that the
cdf of
\begin_inset Formula $Gamma(\alpha,\beta)$
\end_inset
is not simple.
Nevertheless, it is very important that statisticians have found numerical
methods to approximate the cumulative densities of the
\begin_inset Formula $\chi^{2}$
\end_inset
.
\end_layout
\begin_layout Subsubsection
Moments
\end_layout
\begin_layout Standard
Since the
\begin_inset Formula $\chi^{2}(\nu)$
\end_inset
is the same as
\begin_inset Formula $Gamma(\frac{\nu}{2},2)$
\end_inset
, so we can use the formulas (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:GammaEx"
\end_inset
) and (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:GammaVar"
\end_inset
).
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
E[x]=\nu
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
Var[x]=2\nu
\end{equation}
\end_inset
\end_layout
\begin_layout Subsubsection
Comments
\end_layout
\begin_layout Standard
Many statistical procedures can result in a estimate that is distributed
as
\begin_inset Formula $\chi^{2}(\nu)$
\end_inset
.
The mismatch between the saturated model and the fitted generalized linear
model, for example, is distributed as a
\begin_inset Formula $\chi^{2}$
\end_inset
.
The squared mismatch between the observed and predicted counts in a cross
tabulation table is also distributed as a
\begin_inset Formula $\chi^{2}$
\end_inset
.
\end_layout
\begin_layout Standard
When we calculate some estimate from a sample, say we call it
\begin_inset Formula $\hat{k}$
\end_inset
, it is important for us to find out if that estimate is
\begin_inset Quotes eld
\end_inset
in the middle of the usual range
\begin_inset Quotes erd
\end_inset
or if it is in an extreme tail of the possibilities.
If we can calculate the proportion of cases smaller than
\begin_inset Formula $\hat{k}$
\end_inset
,
\begin_inset Formula $F(\hat{k};\nu)$
\end_inset
, then it is obvious we can calculate the proportion of cases that is greater
than
\begin_inset Formula $\hat{k}$
\end_inset
,
\begin_inset Formula $1F(\hat{k};\nu)$
\end_inset
.
That area is represented in the following figure, in which the pdf of
\begin_inset Formula $\chi^{2}(50)$
\end_inset
is drawn.
The shaded area on the rightvalues greater than 67.50represents the
top 5% of possible draws from
\begin_inset Formula $\chi^{2}(50)$
\end_inset
.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
xvals < seq(0,80,length.out=1000)
\end_layout
\begin_layout Plain Layout
chisquare < dchisq(xvals, df=50)
\end_layout
\begin_layout Plain Layout
plot(xvals, chisquare, type="l", xlab=expression(chi^2), ylab="probability
density", ylim=c(0,0.10), main="")
\end_layout
\begin_layout Plain Layout
critVal < qchisq(0.05, df=50, lower.tail=F)
\end_layout
\begin_layout Plain Layout
chiAtCrit < dchisq(critVal, df=50)
\end_layout
\begin_layout Plain Layout
lines(c(critVal,critVal), c(0, chiAtCrit), lty=4)
\end_layout
\begin_layout Plain Layout
abline(h=0, lwd=0.5)
\end_layout
\begin_layout Plain Layout
mtext(expression(hat(k)), side=1, line=1, at=critVal)
\end_layout
\begin_layout Plain Layout
xvals < seq(critVal, 80, length.out=50)
\end_layout
\begin_layout Plain Layout
polygon( x=c(xvals, xvals[50], sort(xvals,decreasing=T), critVal),
\end_layout
\begin_layout Plain Layout
y=c(dchisq(xvals,df=50), 0, rep(0,50), 0), col=gray(.90))
\end_layout
\begin_layout Plain Layout
text(74, 0.018, expression(area == 1F(hat(k))))
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tChiSquare20}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Extreme Values of
\begin_inset Formula $\chi^{2}(50)$
\end_inset
\begin_inset CommandInset label
LatexCommand label
name "fig:ChiSquare20"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsection
Student's t distribution
\end_layout
\begin_layout Standard
One of the most interesting stories in the folklore of statistics is that
an employee of the Guinness beer company discovered this distribution,
but his employer would not allow him to publish it under his own name.
When William S.
Gosset published
\begin_inset Quotes eld
\end_inset
The Probable Error of A Mean
\begin_inset Quotes erd
\end_inset
in 1908, he elected to use the pen name
\begin_inset Quotes eld
\end_inset
Student.
\begin_inset Quotes erd
\end_inset
The finding was not immediately recognized for its value, but the famous
statistician R.A.
Fisher popularized Student's t distribution and made it a cornerstone in
his system of hypothesis testing.
\end_layout
\begin_layout Standard
The t distribution is symmetric and unimodal.
It has one parameter,
\begin_inset Formula $\nu$
\end_inset
.
Its center pointmean,median, and modeis always at
\begin_inset Formula $x=0.$
\end_inset
It is similar to the normal distribution.
Extreme outcomes are more likely in the t distribution.
Statisticians say that t has
\begin_inset Quotes eld
\end_inset
fatter tails
\begin_inset Quotes erd
\end_inset
the normal.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < seq(4,4, length.out=1000)
\end_layout
\begin_layout Plain Layout
px1 < dt(x, df=1)
\end_layout
\begin_layout Plain Layout
px2 < dt(x, df=5)
\end_layout
\begin_layout Plain Layout
px3 < dt(x, df=20)
\end_layout
\begin_layout Plain Layout
px4 < dt(x, df=100)
\end_layout
\begin_layout Plain Layout
plot(x, px1, xlab="t",ylab="probability density of t",type="l", ylim=c(0,0.5))
\end_layout
\begin_layout Plain Layout
lines(x,px2, lty=2)
\end_layout
\begin_layout Plain Layout
lines(x,px3, lty=3)
\end_layout
\begin_layout Plain Layout
lines(x,px4, lty=4)
\end_layout
\begin_layout Plain Layout
legend("topright",legend=c("df=1","df=5", "df=20", "df=100"),lty=1:4)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tt05}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
t Densities
\begin_inset CommandInset label
LatexCommand label
name "fig:tDensities"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Probability Density Function
\end_layout
\begin_layout Standard
The probability density of the t distribution is
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
f(x;\nu)=\frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu\pi}\Gamma(\frac{\nu}{2})}\left(1+\frac{x^{2}}{\nu}\right)^{(\frac{v+1}{2})}.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Even if we ignore the normalizing constant at the front, we are still left
with a formidable expression.
Does it help to write this as:
\begin_inset Formula
\begin{equation}
f(x;\nu)\propto\frac{1}{\left(1+\frac{x^{2}}{\nu}\right)^{(\frac{v+1}{2})}}?
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
As
\begin_inset Formula $x$
\end_inset
grows larger, the denominator grows larger and so the density gets smaller.
\end_layout
\begin_layout Standard
The t distribution was developed to help deal with the following problem.
Suppose we collect a sample of data and from it we calculate estimates
of the mean and the variance (and its square root, the standard deviation).
We want to know if the observed mean is in the
\begin_inset Quotes eld
\end_inset
middle range
\begin_inset Quotes erd
\end_inset
of what we expect (close to the expected value) or if it is extreme.
If we think of this as if it were a
\begin_inset Formula $Z$
\end_inset
statistic (see expression (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:Zstatistic"
\end_inset
)), it seems as though we ought to be able to make a comparison, something
like
\begin_inset Formula
\begin{equation}
\frac{estimated\, meannull\, hypothesis}{standard\, deviation\, of\, mean}.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
What distribution would that estimate have? Gosset suggested that the result
is distributed according to his t distribution.
\end_layout
\begin_layout Standard
At this point, I need to explain how to calculate the standard deviation
of the mean.
Please remember I'm using the symbol
\begin_inset Formula $\widehat{E[x]}$
\end_inset
to refer to a sample mean (not the usual symbol
\begin_inset Formula $\bar{x}$
\end_inset
) and the sample variance is
\begin_inset Formula $\widehat{Var[x]}$
\end_inset
.
The variance of the estimated mean across samples is much smaller than
the variance itself.
In fact,
\begin_inset Formula
\begin{equation}
Var[\widehat{E[x]}]=\frac{1}{N}Var(x)
\end{equation}
\end_inset
Here is how that is derived.
Collect a sample and calculate the average,
\begin_inset Formula
\begin{equation}
\widehat{E[x]}=\frac{x_{1}+x_{2}+\ldots+x_{N}}{N}.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Apply the variance operator to both sides.
\begin_inset Formula
\begin{eqnarray}
Var[\widehat{E[x]}] & = & Var\left(\frac{x_{1}+x_{2}+\ldots+x_{N}}{N}\right)\nonumber \\
& = & \frac{1}{N^{2}}\left(Var(x_{1})+Var(x_{2})+\ldots+Var(x_{N})\right)\nonumber \\
& = & \frac{1}{N^{2}}\sum_{i=1}^{N}Var(x_{i})=\frac{1}{N}Var(x)
\end{eqnarray}
\end_inset
\begin_inset Newline newline
\end_inset
Thus, the true variance (and its square root, the standard deviation) of
an estimated mean are known, as long as the true variance of
\begin_inset Formula $x$
\end_inset
itself is known.
\end_layout
\begin_layout Standard
In practice, the true variance of
\begin_inset Formula $x$
\end_inset
is not known, and thus we are wrestling with the fact that both the mean
and the variance must be estimated from the same sample.
What if we could proceed
\emph on
as if
\emph default
the estimated standard deviation of the mean were actually correct? Gosset
charted out a plan to do just that.
In his own words, he sought to find the
\begin_inset Quotes eld
\end_inset
standard deviation of the standard deviation
\begin_inset Quotes erd
\end_inset
across samples so as to appreciate the ratio of the estimate to its estimated
standard deviation.
\end_layout
\begin_layout Standard
Today, we think of the problem like this.
A standard normal variable could be created if we knew the true variance,
as in
\begin_inset Formula
\begin{equation}
\frac{\widehat{E[x]}E[x]}{\sqrt{Var[x]/N}}\sim N(0,1)\label{eq:tNumerator}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Don't worry that
\begin_inset Formula $Var[x]$
\end_inset
is unknown, we will find a way to cancel it out.
The ratio of the estimated variance to true variance is proportional to
a
\begin_inset Formula $\chi^{2}$
\end_inset
variable with
\begin_inset Formula $\nu=N$
\end_inset
.
\begin_inset Formula
\begin{equation}
\frac{\widehat{Var[x]}}{Var[x]}\sim\frac{1}{N}\chi^{2}(N)\label{eq:tDenominator}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Divide (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:tNumerator"
\end_inset
) by the square root of (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:tDenominator"
\end_inset
),
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\frac{\widehat{E[x]}E[x]}{\sqrt{N}\sqrt{Var[x]}}\div\sqrt{\frac{\widehat{Var[x]}}{Var[x]}}=\frac{\widehat{E[x]}E[x]}{\sqrt{N}\sqrt{\widehat{Var[x]}}}=\frac{\widehat{E[x]}E[x]}{\sqrt{N}\widehat{StdDev[x]}}.\label{eq:tNumerator1}
\end{equation}
\end_inset
The unknown
\begin_inset Formula $Var[x]$
\end_inset
disappears, and we are left with exactly the result we were looking for.
It looks like a
\begin_inset Formula $Z$
\end_inset
statistic, but we can use an estimate of the variance.
We call the denominator,
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\uuline off
\uwave off
\noun off
\color none
\begin_inset Formula $\sqrt{N}StdDev[x]$
\end_inset
, the
\begin_inset Quotes eld
\end_inset
standard error of the mean
\begin_inset Quotes erd
\end_inset
because it is an estimate of the standard deviation of the mean (not the
true standard deviation of the mean).
\end_layout
\begin_layout Standard
When a sample is large, then the t ratio described in expression (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:tratio"
\end_inset
) and the standard normal (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:Zstatistic"
\end_inset
) are not noticeably different.
As illustrated in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:NormalTCoincide"
\end_inset
, as
\begin_inset Formula $\nu$
\end_inset
is increased, the
\begin_inset Formula $t(\nu)$
\end_inset
converges to the standard normal distribution.
\end_layout
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < seq(0,4, length.out=200)
\end_layout
\begin_layout Plain Layout
y < matrix(0, ncol=5, nrow=200)
\end_layout
\begin_layout Plain Layout
y[,1] < dt(x, df=1)
\end_layout
\begin_layout Plain Layout
y[,2] < dt(x, df=2)
\end_layout
\begin_layout Plain Layout
y[,3] < dt(x, df=5)
\end_layout
\begin_layout Plain Layout
y[,4] < dt(x, df=20)
\end_layout
\begin_layout Plain Layout
y[,5] < dt(x, df=1000)
\end_layout
\begin_layout Plain Layout
matplot(x,y, type="l",ylab="probability density", col="black")
\end_layout
\begin_layout Plain Layout
lines(x, dnorm(x),lty=2, lwd=3)
\end_layout
\begin_layout Plain Layout
text(0, 0.225, expression(t(nu==1)),pos=4)
\end_layout
\begin_layout Plain Layout
text(0.2, 0.33, expression(t(nu==2)),pos=4)
\end_layout
\begin_layout Plain Layout
text(1.0, 0.25, pos=4, expression(t(nu==1000)))
\end_layout
\begin_layout Plain Layout
text(1.05, 0.23, pos=4, "N(0,1)")
\end_layout
\begin_layout Plain Layout
legend("topright",legend=c(expression(nu==1),expression(nu==2),expression(nu==5)
,expression(nu==20),expression(nu==1000),"N(0,1)"),lty=c(1:5,2), lwd=c(1,1,1,1,1
,3))
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset Graphics
filename plots/tt10.pdf
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Normal(0,1) and
\begin_inset Formula $t(1000)$
\end_inset
Coincide
\begin_inset CommandInset label
LatexCommand label
name "fig:NormalTCoincide"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Cumulative Distribution Function
\end_layout
\begin_layout Standard
Like most of the other distributions that have been discussed here, there
is no simple closed form with which to calculate the cdf of a t statistic.
Because the cumulative probability of a
\begin_inset Formula $t$
\end_inset
is difficult to calculate, statistics books have historically included
a table against which test values can be compared.
\end_layout
\begin_layout Standard
One complication worth mentioning about the cdf is that the
\begin_inset Formula $t$
\end_inset
distribution is usually thought of as a twotailed distribution.
That is, the sample estimate
\begin_inset Formula $\widehat{E[x]}$
\end_inset
may be grossly wrong on the low side, or on the high side.
Unlike the
\begin_inset Formula $\chi^{2}$
\end_inset
distribution, pictured in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:ChiSquare20"
\end_inset
, where we look only on the right tail of the distribution for evidence
of unusual cases, the
\begin_inset Formula $t$
\end_inset
distribution has critical regions both tails.
Consider Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:tTwoTailed"
\end_inset
.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
mint < 4; maxt < 4
\end_layout
\begin_layout Plain Layout
x < seq(mint,maxt,length.out=1000)
\end_layout
\begin_layout Plain Layout
myt < dt(x, df=50)
\end_layout
\begin_layout Plain Layout
plot(x, myt, type="l", xlab="t", ylab="probability density", ylim=c(0,0.40),
main="")
\end_layout
\begin_layout Plain Layout
abline(h=0, lwd=0.5)
\end_layout
\begin_layout Plain Layout
critValH < qt(0.025, df=50, lower.tail=F)
\end_layout
\begin_layout Plain Layout
statAtCritH < dt(critVal, df=50)
\end_layout
\begin_layout Plain Layout
lines(c(critValH,critValH), c(0, statAtCritH), lty=4)
\end_layout
\begin_layout Plain Layout
xvals < seq(critValH, maxt, length.out=50)
\end_layout
\begin_layout Plain Layout
polygon( x=c(xvals, xvals[50], sort(xvals,decreasing=T), critValH),
\end_layout
\begin_layout Plain Layout
y=c(dt(xvals,df=50), 0, rep(0,50), 0), col=gray(.90))
\end_layout
\begin_layout Plain Layout
##Stupidly repeat same code for lower side
\end_layout
\begin_layout Plain Layout
critValL < qt(0.025, df=50, lower.tail=T)
\end_layout
\begin_layout Plain Layout
statAtCritL < dt(critVal, df=50)
\end_layout
\begin_layout Plain Layout
lines(c(critValL,critValL), c(0, statAtCritL), lty=4)
\end_layout
\begin_layout Plain Layout
xvals < seq(mint, critValL, length.out=50)
\end_layout
\begin_layout Plain Layout
polygon( x=c(xvals, xvals[50], mint),
\end_layout
\begin_layout Plain Layout
y=c(dt(xvals,df=50), 0, 0), col=gray(.90))
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tt30}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Extreme Values of
\begin_inset Formula $t(50)$
\end_inset
\begin_inset CommandInset label
LatexCommand label
name "fig:tTwoTailed"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Moments
\end_layout
\begin_layout Standard
Supposing
\begin_inset Formula $\nu\geq1$
\end_inset
, the expected value, median, and mode of a t distribution are all 0.
The variance of a t distribution is
\begin_inset Formula
\begin{equation}
Var[x]=\frac{\nu}{\nu2}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
It is worth noting that as
\begin_inset Formula $\nu\rightarrow\infty$
\end_inset
,
\begin_inset Formula $Var[x]\rightarrow1.0$
\end_inset
, consistent with the claim that the t density converges to
\begin_inset Formula $N(0,1)$
\end_inset
.
\end_layout
\begin_layout Subsubsection
Comments
\end_layout
\begin_layout Standard
The t distribution is thus a handy way to find out if the average from a
sample is out of line with expectations.
That's important, but not so hugely important as the t distribution would
become.
When he popularized Student's t distribution, R.A.
Fisher proposed the
\begin_inset Formula $t$
\end_inset
as a distribution for analysis of a much larger class of problems.
Basically, any problem in which the samplebased estimate is normally distribut
ed may be compared against the t distribution, as long as we can find a
standard error to use in the denominator.
The term
\begin_inset Quotes eld
\end_inset
t ratio
\begin_inset Quotes erd
\end_inset
refers generally to the comparison of any estimator,
\begin_inset Formula $\hat{\theta}$
\end_inset
, for a parameter
\begin_inset Formula $\theta$
\end_inset
, against its standard error.
\begin_inset Formula
\begin{equation}
\frac{\hat{\theta}E[\theta]}{standard\, error(\hat{\theta})}\sim t(\nu).\label{eq:tratio}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
There is usually some work to do when deciding what the
\begin_inset Formula $\nu$
\end_inset
parameter should be, but it is almost always
\begin_inset Formula $Nsomething$
\end_inset
, and in most common situations, it is considered a solved problem.
\end_layout
\begin_layout Subsection
The F distribution
\end_layout
\begin_layout Standard
The
\begin_inset Formula $F(\nu_{1},\nu_{2})$
\end_inset
distribution (
\begin_inset Quotes eld
\end_inset
F
\begin_inset Quotes erd
\end_inset
is for Fisher) describes a variable on
\begin_inset Formula $[0,\infty)$
\end_inset
.
It depends on 2 parameters,
\begin_inset Formula $\nu_{1}$
\end_inset
and
\begin_inset Formula $\nu_{2}$
\end_inset
.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
nu1 < c(25,100,200)
\end_layout
\begin_layout Plain Layout
nu2 < c(25,50,100,200)
\end_layout
\begin_layout Plain Layout
x < seq(0.2, 5, length = 200)
\end_layout
\begin_layout Plain Layout
dF11 < df(x, nu1[1], nu2[1])
\end_layout
\begin_layout Plain Layout
dF22 < df(x, nu1[1], nu2[2])
\end_layout
\begin_layout Plain Layout
dF23 < df(x, nu1[2], nu2[3])
\end_layout
\begin_layout Plain Layout
dF31 < df(x, nu1[2], nu2[4])
\end_layout
\begin_layout Plain Layout
dF33 < df(x, nu1[3], nu2[3])
\end_layout
\begin_layout Plain Layout
plot(x, dF11, ylab = "Probability Density", main = "",
\end_layout
\begin_layout Plain Layout
type = "l", ylim=c(0,1.5))
\end_layout
\begin_layout Plain Layout
lines(x, dF22, lty = 2)
\end_layout
\begin_layout Plain Layout
lines(x, dF23, lty = 3)
\end_layout
\begin_layout Plain Layout
lines(x, dF31, lty = 4)
\end_layout
\begin_layout Plain Layout
lines(x, dF33, lty = 5)
\end_layout
\begin_layout Plain Layout
legend("topright", legend =
\end_layout
\begin_layout Plain Layout
c(expression(paste(nu[1]==25,",", nu[2]==25)),
\end_layout
\begin_layout Plain Layout
expression(paste(nu[1]==25,",", nu[2]==50)),
\end_layout
\begin_layout Plain Layout
expression(paste(nu[1]==100,",", nu[2]==100)),
\end_layout
\begin_layout Plain Layout
expression(paste(nu[1]==100,",", nu[2]==200)),
\end_layout
\begin_layout Plain Layout
expression(paste(nu[1]==200, ",", nu[2]==100))), lty=1:6)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tF20}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Density of
\begin_inset Formula $F(\nu_{1},\nu_{2})$
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Probability Density Function
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
f(x;\nu_{1},\nu_{2})=\frac{\Gamma\left(\frac{\nu_{1}+\nu_{2}}{2}\right)}{\Gamma(\frac{\nu_{1}}{2})\Gamma(\frac{\nu_{2}}{2})}\nu_{1}^{\nu_{1}/2}\nu_{2}^{\nu_{2}/2}x^{\frac{\nu1}{2}1}\left(\nu_{2}+\nu_{1}x\right)^{(\nu_{1}+\nu_{2})/2}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
which may be rearranged as
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
f(x;\nu_{1},\nu_{2})=\frac{\Gamma\left(\frac{\nu_{1}+\nu_{2}}{2}\right)}{\Gamma(\frac{\nu_{1}}{2})\Gamma(\frac{\nu_{2}}{2})}\left(\frac{\nu_{1}}{\nu_{2}}\right)^{\nu_{1}/2}x^{\frac{\nu1}{2}1}\left(1+\frac{\nu_{1}}{\nu_{2}}x\right)^{(\nu_{1}+\nu_{2})/2}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
I've tried to find a way to simplify that so that it would carry some intuition,
but I have failed entirely.
Perhaps I can make the effort worthwhile by explaining why that distribution
has the shape that it does.
\end_layout
\begin_layout Standard
Here is where the F distribution comes from.
Suppose we begin with the goal of comparing two samples of observations.
We already know that
\begin_inset Formula $Z_{1}^{2}+Z_{2}^{2}+Z_{\nu_{1}}^{2}$
\end_inset
is distributed as a
\begin_inset Formula $\chi^{2}(\nu_{1}$
\end_inset
).
How should we compare that against a second set of observations, one for
which the sum of squares is
\begin_inset Formula $\chi^{2}(v_{2})$
\end_inset
? So far as I know, there is no known method to compare the difference of
two
\begin_inset Formula $\chi^{2}$
\end_inset
statistics, but it is possible to compare their ratio.
If one sample size, say
\begin_inset Formula $\nu_{1}$
\end_inset
, is significantly larger than the other one, then it seems obvious that
its sum of squares will be larger, even if the cases are not more widely
dispersed.
In order to bring two sums of squares into a comparable state, we must
divide the
\begin_inset Formula $\chi^{2}$
\end_inset
distributed sum by the number of scores.
The test statistic we want to understand is thus a ratio of
\begin_inset Quotes eld
\end_inset
mean squares
\begin_inset Quotes erd
\end_inset
:
\begin_inset Formula
\begin{equation}
\frac{Sample\,1:\,\,\,(Z_{1}^{2}+Z_{2}^{2}+\ldots+Z_{\nu_{1}}^{2})/\nu_{1}}{Sample\,2:\,\,\,(Z_{1}^{2}+Z_{2}^{2}+\ldots+Z_{\nu_{2}}^{2})/\nu_{2}}.
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
The pdf of
\begin_inset Formula $F(\nu_{1},\nu_{2})$
\end_inset
represents the diversity we would observe if we repeatedly drew
\begin_inset Formula $\nu_{1}$
\end_inset
and
\begin_inset Formula $\nu_{2}$
\end_inset
observations and then formed this ratio of mean squares.
If the two samples are indeed drawn from a standard normal distribution,
then we expect that value will be approximately 1.0 with some variation
above and below.
\end_layout
\begin_layout Standard
The density associated with example values
\begin_inset Formula $\nu_{1}=\nu_{2}=\nu$
\end_inset
is presented in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Fnu1Equalnu2"
\end_inset
.
If
\begin_inset Formula $\nu_{1}=1$
\end_inset
, then the density of
\begin_inset Formula $F$
\end_inset
is the same as that of squared
\begin_inset Formula $t$
\end_inset
variable.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
nu1 < c(1,25,100,1000)
\end_layout
\begin_layout Plain Layout
x < seq(0.1,6, length = 200)
\end_layout
\begin_layout Plain Layout
dF11 < df(x, nu1[1], nu1[1])
\end_layout
\begin_layout Plain Layout
dF22 < df(x, nu1[2], nu1[2])
\end_layout
\begin_layout Plain Layout
dF33 < df(x, nu1[3], nu1[3])
\end_layout
\begin_layout Plain Layout
dF44 < df(x, nu1[4], nu1[4])
\end_layout
\begin_layout Plain Layout
plot(x, dF11, ylab = "Probability Density", main = "",
\end_layout
\begin_layout Plain Layout
type = "l", ylim=c(0,1.5))
\end_layout
\begin_layout Plain Layout
lines(x, dF22, lty = 2)
\end_layout
\begin_layout Plain Layout
lines(x, dF33, lty = 3)
\end_layout
\begin_layout Plain Layout
lines(x, dF44, lty = 4)
\end_layout
\begin_layout Plain Layout
abline(h=0, lwd=0.3, col=gray(.9))
\end_layout
\begin_layout Plain Layout
legend("topright", legend =
\end_layout
\begin_layout Plain Layout
c(expression(paste(nu[1]==1,",", nu[2]==1)),
\end_layout
\begin_layout Plain Layout
expression(paste(nu[1]==25,",", nu[2]==25)),
\end_layout
\begin_layout Plain Layout
expression(paste(nu[1]==100,",", nu[2]==100)),
\end_layout
\begin_layout Plain Layout
expression(paste(nu[1]==1000,",", nu[2]==1000))), lty=1:4)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tF10}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Density of
\begin_inset Formula $F(\nu_{1},\nu_{2})$
\end_inset
when
\begin_inset Formula $\nu_{1}=\nu_{2}$
\end_inset
\begin_inset CommandInset label
LatexCommand label
name "fig:Fnu1Equalnu2"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Comments
\end_layout
\begin_layout Standard
Suppose we collect samples of data from
\begin_inset Formula $\nu_{1}$
\end_inset
men and
\begin_inset Formula $\nu_{2}$
\end_inset
women.
We'd like to know if the diversity of responses from men is greater than
that of women.
For each group, we calculate the
\begin_inset Quotes eld
\end_inset
mean squares
\begin_inset Quotes erd
\end_inset
(the estimates of the variance) and compare them.
Obviously, if the ratio is 1.0, there is no question, the two are about
the same.
But what if the ratio is 1.5? Is a ratio so large that we would think it
is inconsistent with the idea that the variances among men and women are
the same? That is the sort of test for which the
\begin_inset Formula $F$
\end_inset
works well.
\end_layout
\begin_layout Standard
The t, the
\begin_inset Formula $\chi^{2}$
\end_inset
, and the
\begin_inset Formula $F$
\end_inset
are a power trio in hypothesis testing.
They are, by far, the three most frequently used distributions.
The t distribution represents a ratio of an estimate to its standard error.
The
\begin_inset Formula $\chi^{2}$
\end_inset
summarizes the distribution of a sum of squares.
The
\begin_inset Formula $F$
\end_inset
distribution can be used to analyze the
\emph on
ratio
\emph default
of two sums of squares.
If the observed ratio is large, it means that one sum of squares is substantial
ly larger than another.
The
\begin_inset Formula $F$
\end_inset
test compares the mismatches of 2 models and offers one way to decide if
one
\begin_inset Quotes eld
\end_inset
fits worse
\begin_inset Quotes erd
\end_inset
than another.
\end_layout
\begin_layout Subsection
Binomial Distribution
\end_layout
\begin_layout Standard
The binomial distribution,
\begin_inset Formula $B(N,p)$
\end_inset
, represents the number of
\begin_inset Quotes eld
\end_inset
events
\begin_inset Quotes erd
\end_inset
(or
\begin_inset Quotes eld
\end_inset
successes
\begin_inset Quotes erd
\end_inset
, or
\begin_inset Quotes eld
\end_inset
wins
\begin_inset Quotes erd
\end_inset
, etc.) that occur when there are
\begin_inset Formula $N$
\end_inset
\begin_inset Quotes eld
\end_inset
trials
\begin_inset Quotes erd
\end_inset
(opportunities for an event, success, wins, etc.) and the chance of a success
on each trial is fixed at
\begin_inset Formula $p$
\end_inset
.
\end_layout
\begin_layout Standard
I have a special coin that returns a
\begin_inset Quotes eld
\end_inset
head
\begin_inset Quotes erd
\end_inset
twothirds of the time.
What is the probability that I will get any given number of heads after
flipping 10 times? The binomial distribution gives the answer.
Two depictions of the result are presented in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Binomial10"
\end_inset
.
The plot on the left highlights the fact that the outcomes are discrete
steps, not realvalued outcomes.
However, I fancy the plot on the right because it fits more closely together
with the continuous distributions that we have studied so far.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
par(mfcol=c(1,2))
\end_layout
\begin_layout Plain Layout
x < 0:10
\end_layout
\begin_layout Plain Layout
y < dbinom(x, p=0.66, size=10)
\end_layout
\begin_layout Plain Layout
plot(x,y, type="h", lty=4, xlab="10 Flips with a Biased Coin", ylab="Chance
of Observing x Heads")
\end_layout
\begin_layout Plain Layout
points(x,y,pch=16)
\end_layout
\begin_layout Plain Layout
y < c(y[1],y)
\end_layout
\begin_layout Plain Layout
x < c(1, x)
\end_layout
\begin_layout Plain Layout
plot(x+0.5,y, type="s", lty=4, xlab="10 Flips with a Biased Coin", ylab="Chance
of Observing x Heads")
\end_layout
\begin_layout Plain Layout
par(mfcol=c(1,1))
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tBinomial10}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
50 Coin Flips
\begin_inset CommandInset label
LatexCommand label
name "fig:Binomial10"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Probability Mass Function
\end_layout
\begin_layout Standard
The Binomial probability mass function is:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
Prob(kN,\pi)=\frac{N!}{(Nk)!k!}\pi^{k}(1\pi)^{Nk}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
It is pretty easy to derive this distribution.
If there are
\begin_inset Formula $N$
\end_inset
independent trials, and we wonder how likely we are to get
\begin_inset Formula $k$
\end_inset
successes.
The chance that the first
\begin_inset Formula $k$
\end_inset
trials will succeed, and the rest will fail, is
\begin_inset Formula
\begin{eqnarray*}
\pi\times\pi\times\{k\, times\}\times(1\pi)\times(1\pi)\times\{Nk\, times\}\\
=\pi^{k}(1\pi)^{Nk}
\end{eqnarray*}
\end_inset
\begin_inset Newline newline
\end_inset
That accounts for the second part of the binomial formula, but this is not
quite done.
There are many other ways to get
\begin_inset Formula $k$
\end_inset
successes, and so we have to count all of the possible sequences.
That's where the prefix comes from.
It is the binomial coefficient.
\begin_inset Formula $\frac{N!}{(Nk)!k!}$
\end_inset
is the number of ways to rearrange
\begin_inset Formula $N$
\end_inset
things so that
\begin_inset Formula $k$
\end_inset
are successes and
\begin_inset Formula $Nk$
\end_inset
are not.
\end_layout
\begin_layout Standard
When
\begin_inset Formula $N$
\end_inset
is large, the binomial distribution is quite similar to a normal distribution.
Lets consider an example.
Suppose the chance of having a boy baby is 0.63 for all women in a community.
If 437 women have babies, what is the probability that there will be 200
boys?
\end_layout
\begin_layout Example*
Inserting
\begin_inset Formula $N$
\end_inset
and
\begin_inset Formula $\pi$
\end_inset
into the previous expression, the chance of
\begin_inset Formula $k$
\end_inset
successes is seen to be:
\end_layout
\begin_layout Example*
\begin_inset Formula
\begin{equation}
Prob(k437,0.63)=\frac{437!}{(437k)!k!}(0.63)^{k}(1\pi)^{437k}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
If we had asked for the probability of 300 boys, we would find:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
P(300437,0.63)=0.0001122501
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
I've done some
\begin_inset Quotes eld
\end_inset
hunting and pecking
\begin_inset Quotes erd
\end_inset
with this distribution to find out which values of
\begin_inset Formula $k$
\end_inset
are most likely.
The outcomes with noticeable chances are between 240 and 310, as indicated
in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:BinomialwithN=437"
\end_inset
.
There is a mathematical proof of the fact that as
\begin_inset Formula $N$
\end_inset
tends to infinity, the discrete probabilities of the binomial are very
accurately approximated by a normal distribution.
Of course, as is evident in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Binomial10"
\end_inset
, that approximation will not work when
\begin_inset Formula $N$
\end_inset
is small.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
N < 437; p< 0.63; x1 < max(0, N*p4*sqrt(p*(1p)*N)); x2 < min(N*p+4*sqrt(p*(
1p)*N),N)
\end_layout
\begin_layout Plain Layout
x < as.integer(x1): as.integer(x2+1)
\end_layout
\begin_layout Plain Layout
pseq < dbinom(x, N, p)
\end_layout
\begin_layout Plain Layout
plot(x, pseq, type="h", xlab="k", ylab=paste("Prob(k, N=",N,", p=", p,")"))
\end_layout
\begin_layout Plain Layout
points(x, pseq, pch=18,cex=0.5)
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
placement h
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tBinomial20}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:BinomialwithN=437"
\end_inset
Binomial with N=437 and p=0.63
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Cumulative Distribution
\end_layout
\begin_layout Standard
One of the big problems with analysis of continuous distributions is that
the cumulative distribution function cannot be simplified.
Numerical approximation is required, and there are known problems (and
solutions) for that.
When a distribution is discrete, no approximation is required.
We simply need to calculate a sum.
\end_layout
\begin_layout Subsubsection
Moments
\end_layout
\begin_layout Standard
The expected value is:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
E[x]=\pi\cdot N\label{eq:BinomialExpectedValue}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
and the variance is
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
Var[x]=\pi(1\pi)N\label{eq:BinomialVariance}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
It seems obvious to me that the expected value this is correct.
If we flip a coin 10 times and the chance of a
\begin_inset Quotes eld
\end_inset
head
\begin_inset Quotes erd
\end_inset
is
\begin_inset Formula $\pi$
\end_inset
, it seems reasonable to expect
\begin_inset Formula $\pi\cdot10$
\end_inset
heads.
\end_layout
\begin_layout Standard
There is a simple way to demonstrate that.
Think of the outcome, the number of successes, as a sum of 0's and 1's.
For instance, the observed sample:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
0,1,1,0,1,1,0,0\ldots,1,0
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
is really just a realization of Bernoulli trials, and the number of successes
is just the sum of those trials, as in
\begin_inset Formula
\begin{equation}
x_{1}+x_{2}+x_{3}+\ldots+x_{N1}+x_{N}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
Those are
\begin_inset Quotes eld
\end_inset
statistically independent
\begin_inset Quotes erd
\end_inset
samples of size 1, and each one has probability of success equal to
\begin_inset Formula $\pi$
\end_inset
.
So, considering just one
\begin_inset Quotes eld
\end_inset
event
\begin_inset Quotes erd
\end_inset
in isolation, the chance is
\begin_inset Formula $\pi$
\end_inset
of observing a
\begin_inset Formula $1$
\end_inset
and
\begin_inset Formula $(1\pi)$
\end_inset
chance of observing
\begin_inset Formula $0$
\end_inset
.
So the expected value of that one draw is
\begin_inset Formula
\begin{equation}
E[x_{1}]=\pi\cdot1+(1\pi)\cdot0=\pi
\end{equation}
\end_inset
So of you think of the Binomial as the sum of
\begin_inset Formula $N$
\end_inset
of those experiments,
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{eqnarray}
E[x_{1}+x_{2}+\ldots x_{N}] & = & E[x_{1}]+E[x_{2}]+\ldots+E[x_{n}]\nonumber \\
& = & \pi+\pi+\ldots+\pi\nonumber \\
& = & N\cdot\pi
\end{eqnarray}
\end_inset
\end_layout
\begin_layout Standard
The variance can be derived similarly.
Consider just one draw,
\begin_inset Formula $x_{1}$
\end_inset
, in isolation.
Its variance is
\begin_inset Formula
\begin{eqnarray}
Var[x_{1}] & = & \pi(1E[x_{1}])^{2}+(1\pi)(0E[x_{1}])^{2}\nonumber \\
& = & \pi(1\pi)^{2}(1\pi)(\pi)^{2}\nonumber \\
& = & \pi(12\pi+\pi^{2})+\pi^{2}\pi^{3}\nonumber \\
& = & \pi2\pi^{2}+\pi^{3}+\pi^{2}\pi^{3}\nonumber \\
& = & \pi\pi^{2}=\pi(1\pi)
\end{eqnarray}
\end_inset
The Binomial distribution is a sum of
\begin_inset Formula $N$
\end_inset
of those variables, and they are all statistically independent of each
other.
Thus, the law for calculating the variance of a sum of terms applies.
\begin_inset Formula
\begin{eqnarray}
Var[x_{1}+x_{2}+\ldots x_{N}] & = & Var[x_{1}]+Var[x_{2}]+\ldots+Var[x_{N}]\nonumber \\
& = & \pi(1\pi)+\pi(1\pi)+\ldots+\pi(1\pi)\nonumber \\
& = & \pi(1\pi)N
\end{eqnarray}
\end_inset
\end_layout
\begin_layout Subsubsection
Comments
\end_layout
\begin_layout Standard
In statistical modeling research, the most common use of the binomial distributi
on is in regression with categorical
\begin_inset Quotes eld
\end_inset
Yes
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
No
\begin_inset Quotes erd
\end_inset
outcomes.
These may be
\begin_inset Quotes eld
\end_inset
logistic
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
probit
\begin_inset Quotes erd
\end_inset
regression models.
Suppose we group observations into 5 categories.
For each category, we collect data on trial conditions (these will act
as predictors) as well as the outcomes,
\begin_inset Formula $Y_{i}$
\end_inset
successes out of
\begin_inset Formula $N_{i}$
\end_inset
trials (and thus the observed success rate is
\begin_inset Formula $Y_{i}/N_{i}$
\end_inset
).
Using the predictors (or whatever other information we have handy), we
calculate predicted success rates for the 5 groups,
\begin_inset Formula $(\pi_{1},\pi_{2},\pi_{3},\pi_{4},\pi_{5})$
\end_inset
.
For each group, it is necessary to calculate how likely we were to observe
\begin_inset Formula $Y_{i}$
\end_inset
successes, and the sum of those probabilities is the likelihood function.
We would like to adjust our predictive approach so as to maximize the likelihoo
d, of course.
\end_layout
\begin_layout Subsection
Poisson Distribution
\end_layout
\begin_layout Standard
The Poisson is a discrete distribution with outcomes in the set
\begin_inset Formula $0,1,\ldots,\infty$
\end_inset
.
It is commonly used to describe
\begin_inset Quotes eld
\end_inset
event counts.
\begin_inset Quotes erd
\end_inset
A Poisson distribution was used to generate Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:PoissonDogEscapes"
\end_inset
.
\end_layout
\begin_layout Subsubsection
Probability Mass Function
\end_layout
\begin_layout Standard
The Poisson has a single parameter, which is customarily known as
\begin_inset Formula $\lambda$
\end_inset
.
The probability that there are
\begin_inset Formula $x$
\end_inset
occurrences in a timeinterval is:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
f(x;\lambda)=e^{\lambda}\frac{\lambda^{x}}{x!},\, where\, x\geq0,\lambda>0\label{eq:PoissonPMF}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
where
\begin_inset Formula $e$
\end_inset
is Euler's constant and
\begin_inset Formula $x!$
\end_inset
is the factorial of
\begin_inset Formula $x$
\end_inset
(
\begin_inset Formula $x!=x\cdot(x1)\cdot\cdots2\cdot1$
\end_inset
).
This probability model can be derived in several ways.
The famous French mathematician Simeon Poisson proposed this model in the
mid 1800s, reasoning as follows.
Begin with the idea of time passing in
\begin_inset Quotes eld
\end_inset
small chunks,
\begin_inset Quotes erd
\end_inset
\begin_inset Formula $\Delta t$
\end_inset
.
Suppose the chance of one event during that time is approximately
\begin_inset Formula $\lambda\cdot\Delta t$
\end_inset
(and, as
\begin_inset Formula $\Delta t$
\end_inset
shrinks to
\begin_inset Formula $0$
\end_inset
,
\begin_inset Formula $\lambda\Delta t$
\end_inset
approximates the chance of an event more and more closely).
Assume further that the chance of a second event in a particular chunk
of time is vanishingly small.
Then the analysis of some differential equations results in the formula
proposed in (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:PoissonPMF"
\end_inset
).
Another derivation begins with a binomial distribution,
\begin_inset Formula $B(N,\pi)$
\end_inset
.
Let
\begin_inset Formula $\lambda=\pi N$
\end_inset
, so
\begin_inset Formula $\pi=N/\lambda$
\end_inset
.
Insert that into the binomial probability mass function, and let
\begin_inset Formula $N$
\end_inset
grow arbitrary large and let
\begin_inset Formula $p$
\end_inset
grow smaller.
Several limits must be calculated, and one finds that when
\begin_inset Formula $p$
\end_inset
is not very large, the Poisson pmf very closely approximates
\begin_inset Formula $B(\pi,N)$
\end_inset
as
\begin_inset Formula $N\rightarrow\infty$
\end_inset
.
Hence, when it is difficult to calculate
\begin_inset Formula $B(N,\pi)$
\end_inset
because
\begin_inset Formula $N$
\end_inset
is large, the Poisson model can serve as a reasonable approximation.
\end_layout
\begin_layout Standard
The term
\begin_inset Formula $e^{\lambda}$
\end_inset
(same as
\begin_inset Formula $1/e^{\lambda})$
\end_inset
is a normalizing constant.
The kernel of this probability model is simply
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\frac{\lambda^{x}}{x!}\label{eq:PoissonKernel}
\end{equation}
\end_inset
The values that this implies are presented in Table
\begin_inset CommandInset ref
LatexCommand ref
reference "tab:KernelofPoisson"
\end_inset
.
\end_layout
\begin_layout Standard
\begin_inset Float table
wide false
sideways false
status open
\begin_layout Plain Layout
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Kernel of Poisson Probability Model
\begin_inset CommandInset label
LatexCommand label
name "tab:KernelofPoisson"
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Tabular
\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $ $
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
x
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
0
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
1
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
2
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
3
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
4
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
5
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $\ldots$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $\infty$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $\lambda^{x}/x!$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
1
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $\lambda^{1}$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $\lambda^{2}/2!$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $\lambda^{3}/3!$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $\lambda^{4}/4!$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $\lambda^{5}/5!$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\end_layout
\end_inset

\begin_inset Text
\begin_layout Plain Layout
\begin_inset Formula $\lambda^{\infty}/\infty!$
\end_inset
\end_layout
\end_inset

\end_inset
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\begin_layout Standard
The sum of the items in the second row is
\begin_inset Formula
\begin{equation}
exp(\lambda)=1+\lambda+\lambda^{2}/2!+\lambda^{3}/3!+\lambda^{4}/4!+\lambda^{5}/5!+\ldots+\lambda^{\infty}/\infty!\label{eq:infsum}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
The right side is equal to
\begin_inset Formula $exp(\lambda)$
\end_inset
because that is the definition of the
\begin_inset Formula $exp$
\end_inset
function.
To verify that, differentiate the sum in expression (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:infsum"
\end_inset
) and notice the result is exactly the same sum.
That's a good piece of evidence that it really is
\begin_inset Formula $exp(\lambda)$
\end_inset
.
\end_layout
\begin_layout Standard
It is obvious that the values depend on whether
\begin_inset Formula $\lambda$
\end_inset
is greater than
\begin_inset Formula $1$
\end_inset
.
If
\begin_inset Formula $\lambda$
\end_inset
is less than one, then the most likely outcome is always
\begin_inset Formula $0$
\end_inset
and higher values are progressively less likely.
On the other hand, if
\begin_inset Formula $\lambda$
\end_inset
is greater than
\begin_inset Formula $1$
\end_inset
, then the story is quite different.
There is a
\begin_inset Quotes eld
\end_inset
race
\begin_inset Quotes erd
\end_inset
between the numerator, which is growing rapidly, and the denominator, which
will always win out in the end, but will trail in the early stages.
\end_layout
\begin_layout Standard
\begin_inset Branch R
status open
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
<>=
\end_layout
\begin_layout Plain Layout
x < 0:10
\end_layout
\begin_layout Plain Layout
y < dpois(x, lambda=1.3)
\end_layout
\begin_layout Plain Layout
par(mfcol=c(1,2))
\end_layout
\begin_layout Plain Layout
plot(x,y, type="h", lty=4, xlab=expression(x ~~ group("(",lambda==1.3,")"))
, ylab="Probability")
\end_layout
\begin_layout Plain Layout
points(x,y, pch=16)
\end_layout
\begin_layout Plain Layout
y < dpois(x, lambda=4.0)
\end_layout
\begin_layout Plain Layout
plot(x,y, type="h", lty=4, xlab=expression(x ~~ group("(",lambda==4.0,")"))
, ylab="Probability")
\end_layout
\begin_layout Plain Layout
points(x,y, pch=16)
\end_layout
\begin_layout Plain Layout
par(mfcol=c(1,1))
\end_layout
\begin_layout Plain Layout
@
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
includegraphics{plots/tPois50}
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
Poisson Mass Function with
\begin_inset Formula $\lambda=1.3$
\end_inset
and
\begin_inset Formula $4.0$
\end_inset
\begin_inset CommandInset label
LatexCommand label
name "fig:Poisson50"
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsubsection
Moments
\end_layout
\begin_layout Standard
The expected value is equal to its variance, and both of them are equal
to
\begin_inset Formula $\lambda$
\end_inset
.
\begin_inset Formula
\[
E(x)=\lambda
\]
\end_inset
\begin_inset Formula
\[
Var(x)=\lambda
\]
\end_inset
\end_layout
\begin_layout Standard
My first inclination was to avoid proving this result because I thought
it required the use of moment generating functions from mathematical statistics.
However, a colleague demonstrated a more simple method of deriving the
expected value.
The sum of strictly positive outcomes
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
E[x]=\sum_{i=1}^{\infty}x_{i}\cdot e^{\lambda}\frac{\lambda^{x_{i}}}{x_{i}!},
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Shifting the index variable from
\begin_inset Formula $i=0$
\end_inset
to
\begin_inset Formula $i=1$
\end_inset
causes the value of
\begin_inset Formula $x_{i}$
\end_inset
to become
\begin_inset Formula $(x_{i}+1$
\end_inset
).
\begin_inset Formula
\begin{equation}
E[x]=\sum_{i=0}^{\infty}(x_{i}+1)\cdot e^{\lambda}\frac{\lambda^{(x_{i}+1)}}{(x_{i}+1)!}
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
Remove the common factor
\begin_inset Formula $\lambda$
\end_inset
from the summation and cancel like terms in the numerator and denominator:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
E[x]=\lambda\sum_{i=0}^{\infty}e^{\lambda}\frac{\lambda^{x_{i}}}{x_{i}!}.
\end{equation}
\end_inset
Note that the sum must be 1.0, because we are adding up all of the probabilities
from
\begin_inset Formula $0$
\end_inset
to
\begin_inset Formula $\infty$
\end_inset
.
\begin_inset Formula
\begin{equation}
E[x]=\lambda\times1=\lambda
\end{equation}
\end_inset
The proof that the variance is also
\begin_inset Formula $\lambda$
\end_inset
follows from the same type of argument in which we calculate
\begin_inset Formula $E[x^{2}]$
\end_inset
.
\end_layout
\begin_layout Subsubsection
Comments
\end_layout
\begin_layout Standard
Poisson regression became something of a fad in the 1990s.
Many variables were counts, and a model predicting a mean for each case
(
\begin_inset Formula $\lambda_{i}$
\end_inset
) is relatively easy to construct.
By far, the most commonly used predictive model is the exponential form,
\begin_inset Formula
\begin{equation}
\lambda_{i}=exp(\beta_{0}+\beta_{1}z_{i})
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
where
\begin_inset Formula $z_{i}$
\end_inset
is a predictor (independent variable).
The Poisson regression falls into the family of generalized linear models,
and there are standard routines for calculating estimates of the parameters
\begin_inset Formula $\beta_{0}$
\end_inset
and
\begin_inset Formula $\beta_{1}$
\end_inset
for any of the models in that class.
\end_layout
\begin_layout Standard
In many data sets, the application of the Poisson model will be somewhat
wanting because the data will exhibit more dispersion than the model predicts.
In that case, it is now common to revise the model to include a multiplicative
random error that is drawn from a
\begin_inset Formula $Gamma(\alpha,1/\alpha)$
\end_inset
distribution.
Call that gamma variable
\begin_inset Formula $\varepsilon_{i}$
\end_inset
and the new model becomes
\begin_inset Formula
\begin{equation}
\lambda_{i}=\varepsilon_{i}\times exp(\beta_{o}+\beta_{1}z_{i})
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
That generates a model with the same expected value (since
\begin_inset Formula $E[\varepsilon_{i}]=1$
\end_inset
), but a higher amount of variance.
The primary reason for using that particular type of random variable is
that the combined effect of mixing in
\begin_inset Formula $\varepsilon_{i}$
\end_inset
and then drawing from
\begin_inset Formula $Poisson(\lambda_{i})$
\end_inset
is known to be a variable that has the socalled negative binomial distribution.
That is distribution has the same expected value,
\begin_inset Formula $\lambda_{i}$
\end_inset
but a larger variance.
\end_layout
\begin_layout Subsection
If I Had an Infinite Amount of Time and Space
\end_layout
\begin_layout Standard
When I started drafting this essay, I believed there were 10 distributions
that most applied researchers would have in mind on a daytoday basis.
I've discussed 10 distributions, and I think I would like to revise my
estimate to 14 or 15.
If I had an infinite amount of time, I would write little reports on (at
least) the following distributions
\end_layout
\begin_layout Enumerate
Negative binomial.
The output of a Poisson process with additional randomness is negative
binomial.
\end_layout
\begin_layout Enumerate
Multivariate normal.
The outcome is a vector of correlated outcomes
\end_layout
\begin_layout Enumerate
Multinomial.
This helps to figure out the chances of getting (142 red, 213 blue, 187
orange, 258 brown, 200 green) in a bag of 1000
\emph on
M&M
\emph default
candies.
In contrast, the binomial only helps to figure out the chances of getting
(455 white, 545 pink) in bag of
\emph on
Good & Plenty
\emph default
.
\end_layout
\begin_layout Enumerate
Dirichlet.
This extends the beta distribution to multiple dimensions.
It allows to talk about the possibility that the
\emph on
M&M Mars
\emph default
company does not always use the same probability mixture for their bags
of candy.
Perhaps they choose
\begin_inset Formula $(\pi_{1},\pi_{2},\pi_{3},\pi_{4},\pi_{5})$
\end_inset
according to some random scheme.
\end_layout
\begin_layout Enumerate
Wishart.
The output of this process is a square matrix that can be interpreted as
a covariance matrix.
\end_layout
\begin_layout Section
Conclusions
\end_layout
\begin_layout Standard
In some ways, I feel encouraged by the state of probability modeling.
Compared to the time when I was in graduate school, when computers were
inaccessible and there was little understanding of distributions except
for the normal, t, and F distributions, we have an enormous amount of conceptua
l power at our fingertips.
Any probability model one can imagine can be brought into use relatively
easily.
If one can offer a plausible reason for investigation of that new model,
then others will probably want to consider variations on the parameters
and domain.
\end_layout
\begin_layout Standard
As a case in point, I would mention the growth of interest in the socalled
skewed distributions.
The most prominent is the skewnormal distribution, which is obtained by
taking the normal pdf,
\begin_inset Formula $f(x_{i};\mu,\sigma^{2})$
\end_inset
and multiplying by a skew factor, which is
\begin_inset Formula $2$
\end_inset
times the cdf of the normal.
\begin_inset Formula
\begin{equation}
skewnormal\, pdf:\, f(x;\mu,\sigma^{2})\times2\times\int_{\infty}^{\alpha x}f(t;0,1)dt\cdot
\end{equation}
\end_inset
\begin_inset Newline newline
\end_inset
If the skew factor
\begin_inset Formula $\alpha=0.0$
\end_inset
, then the skew disappears and this is just the same old normal.
As far as I can tell, this was skew framework was originally proposed in
1985 (Azzalini, A.
(1985).
"A class of distributions which includes the normal ones".
Scand.
J.
Statist.
12: 171–178) and there are now proposals for skew versions of most distribution
s (the ones we previously thought were symmetric, such as t).
\end_layout
\begin_layout Standard
I am encouraged, but also frightened because it appears to grow more and
more difficult for parttimers like me to comprehend the magnitude of the
probability modeling enterprise.
In a 2008 article for the teacher's corner in the American Statistician,
Leemis and McQueston assembled the single most overwhelming piece of line
art I have ever seen.
A snapshot is reproduced in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:LeemisandMcQueston"
\end_inset
:
\end_layout
\begin_layout Standard
\begin_inset Float figure
placement h
wide false
sideways false
status open
\begin_layout Plain Layout
\end_layout
\begin_layout Plain Layout
\begin_inset Graphics
filename importfigs/LeemisMcQueston2008.pdf
width 6.5in
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:LeemisandMcQueston"
\end_inset
Leemis and McQueston Distribution Diagram
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\begin_layout Standard
If I were a graduate student, here is where I would start.
\end_layout
\begin_layout Standard
First, I would probably take three or four more math courses than I took.
I took calculus for engineering students, linear algebra, and sat in on
courses in differential equations and mathematical statistics.
Sitting in is never as good as actually taking the courses, and I've often
regretted that I did not take the time.
I never enrolled in real analysis, but I wish I had.
If you are a student who has already taken these courses, and you think
you know everything, go find some physicists who do asymptotic distribution
theory.
There is plenty more to learn.
\end_layout
\begin_layout Standard
Second, I would explore as many distributions as I could find prepackaged
for whatever statistical software is in vogue in the future.
At one time, that was SAS, now it seems to be S+/R.
I would try to plot the pdf's and cdf's, explore the sampling distributions
of their means.
I'd try to verify the textbook claims about the way in which some function
of a random variable from one distribution converges to another one.
I'd write short summaries of the distributions for my use.
\end_layout
\begin_layout Standard
Third, I'd get a copy of an opensource scientific programming library,
such as the GNU Scientific Library or CERN's COLT, and I would study ways
to integrate those tools with my preferred statistical modeling tools.
It seems certain to me that when new statistical distributions appear on
the scene, they will first be offered in a lowlevel programming language
like C, Fortran, or Java, and a familiarity with a scientific programming
library will facilitate the integration of those new distributions with
my collection.
\end_layout
\begin_layout Standard
Finally, I'd find a practitioner of Bayesian statistics.
Even if you don't choose to be a Bayesian, it is still likely that working
with one of them will help.
Bayesians are, almost by definition, forced to live in a forest of statistical
distributions and they need ways to make those distributions work together,
more or less.
\end_layout
\end_body
\end_document