2015-02-04

# Visualize (Whirled Peas)

Quick PDF sketch

• Normal (too easy)

• Gamma

• Beta

• Logistic

• Poisson

• Binomial

# I’m Not that

length(x1)
## [1] 2000
x1[1:10]
##  [1]  178.86 -225.95  371.74  133.33  311.81  177.53  368.16  363.94
##  [9]   73.85  335.23
rockchalk::summarize(x1)
## $numerics ## x1 ## 0% -549.42 ## 25% -49.50 ## 50% 87.25 ## 75% 223.28 ## 100% 757.48 ## mean 84.84 ## sd 196.60 ## var 38653.11 ## NA's 0.00 ## N 2000.00 ## ##$factors
## NULL

I am not: a) Poisson b) Normal c) Uniform d) Beta e) Binomial

# I’m Not that

I am not a) Poisson b) Normal c) Gamma d) Beta e) Binomial

x1 <- rnorm(2000, 88, 200)

# I’m Not that 2

x2[1:10]
##  [1] 2.2661 0.9260 0.6397 0.3697 0.8037 4.2529 2.6449 1.1851 0.4970 0.7945
summarize(x2)
## $numerics ## x2 ## 0% 0.027 ## 25% 0.964 ## 50% 1.714 ## 75% 2.680 ## 100% 10.756 ## mean 1.991 ## sd 1.377 ## var 1.896 ## NA's 0.000 ## N 2000.000 ## ##$factors
## NULL

I am not a) Poisson b) Normal c) Uniform d) Beta

# I’m Not that 2

I am not a) Poisson b) Logistic c) Uniform d) Beta

x2 <- rgamma(2000, 2, 1)

# I’m Not that 3

x3[1:10]
##  [1] 2 2 3 3 2 5 3 3 4 3
rockchalk::summarize(x3)
## $numerics ## x3 ## 0% 0.000 ## 25% 2.000 ## 50% 3.000 ## 75% 4.000 ## 100% 9.000 ## mean 2.959 ## sd 1.447 ## var 2.094 ## NA's 0.000 ## N 2000.000 ## ##$factors
## NULL

I am not: a) Poisson b) Normal c) Uniform d) Beta e) Binomial

# I’m Not that

I am not a) Poisson b) Normal c) Gamma d) Beta

Actually, I was rbinom(2000, 10, prob = c(0.3))

# plot-histogramWithLinesAndLegend.R(html)

drawHist(x1)

# ex 2

drawHist(x2)

# ex 3

drawHist(x3)

# EV: Simple idea with complicated jargon

• Expected Value

Probability weighted sum of outcomes

• Outcomes are 1, 2, 3
• Probabilities are 0.5, 0.4, and 0.1

Calculate the Expected Value: ?

0.5 * 1 + 0.4 * 2 + 0.1 * 3

Easy!

# Do you think terminology is the major problem

• Common, everyday meaning of “expected value”
• Is that ever what a lay person would “expect”?
• For which distributions is this “informative”? (unimodal, symmetric)
• When is it not subjectively informative? (others)

# Even if EV is not subjectively informative…

• Its still a fixed characteristic of the probability process
• Estimate it!
• Sometimes we engage in this game
• Stipulate a theoretical distribution
• Calculate the average from a sample
• Note the sample is “so far” from what we theorized about the EV that the theory must be wrong!

# Got R?

x1 <- rpois(5000, lambda = 0.2)
hist(x1, main = "My EV is 0.2")
x2 <- rpois(5000, lambda = 10)
hist(x2, main = "My EV is 10")
• please make note of these numbers
mean(x1)
mean(x2)
sd(x1)
sd(x2)

# I got

x1 <- rpois(5000, lambda = 0.2)
hist(x1, main = "My EV is 0.2")

• Sometimes I notice that R’s hist doesn’t work so well with discrete variables. Have ideas about that, but don’t want to bother you about them now.
x2 <- rpois(5000, lambda = 10)
hist(x2, main = "My EV is 10")

• Wedge those into same figure with par(mfcol = c(2,1))

# Sample average is one way to estimate EV

• The mean

$\bar{x}=\frac{sum\ of\ observed\ values}{N}$ - Need criteria to say if $$\bar{x}$$ is a “good”" estimate of $$E[x]$$

• Later in term
• unbiased

$$\bar{x}$$ flutters around E[x]

• consistent

As your sample grows, gap between $$\bar{x}$$ and $$E[x]$$ will probably shrink

# Probabilities. Who Likes Calculus?

• This is a discrete variable, so we’d say it has “Probability Mass”, techincally different from a “continuum”
• Easiest math
• Probability of an outcome simply given to us
• Easy to see addition and multiplication rules.

# Continuous variables

• If you can put up with integrals, you can describe these things more clearly.
• PDF, f(x): probability density of a score equal to x
• CDF, F(k): cumulative distribution function. Chance of an observed score smaller than k

# Extreme scores

• Sketch some PDF (plays per fumble)
• Mark any point k.
• We almost never interested in question of a score exactly k
• Are interested in outcomes more extreme than k
• Imagine
• the distribution of what would happen
• the data gives you some estimate k
• we want to know if k is extremely different from model

# Data Generating Process

• I am almost never interested in describing “the people out there from which we are sampling”
• I am interested in estimating the parameters of the process that generates them.

# I want the distribution of an estimate

• what did you get for $$mean(x1)$$ before?

• Sampling distribution

# Variance

• Expected value of $$(x - E[x])^2$$

• Can we calculate variance of {1, 2, 3} with probabilities {0.5, 0.4, and 0.1}?

• Var[x] is a characteristic of the probability process
• Sample variance is an estimate of it.

# Terminology

• Differentiate the “theoretical property” from the “sample estimate”

• Expected Value <=> sample mean

• There’s no special word for “population variance” similar to expected value (thus creating confusion when discussing variance)

# How to calculate sample variance estimates

• Here, the N and N-1 problem slaps us in the face.
• Intuition, a good formula appears to be $s1 = \frac{Sum\ of (x - E[x])^2}{N}$
• That is a maximum likelihood estimate

# But it is not unbiased.

• $$E[s1]\neq Var[x]$$

• But this is unbiased

$s1 = \frac{Sum\ of (x - E[x])^2}{N-1}$