Paul E. Johnson

2015-02-04

Quick PDF sketch

Normal (too easy)

Gamma

Beta

Logistic

Poisson

Binomial

`length(x1)`

`## [1] 2000`

`x1[1:10]`

```
## [1] 178.86 -225.95 371.74 133.33 311.81 177.53 368.16 363.94
## [9] 73.85 335.23
```

`rockchalk::summarize(x1)`

```
## $numerics
## x1
## 0% -549.42
## 25% -49.50
## 50% 87.25
## 75% 223.28
## 100% 757.48
## mean 84.84
## sd 196.60
## var 38653.11
## NA's 0.00
## N 2000.00
##
## $factors
## NULL
```

I am not: a) Poisson b) Normal c) Uniform d) Beta e) Binomial

I am not a) Poisson b) Normal c) Gamma d) Beta e) Binomial

x1 <- rnorm(2000, 88, 200)

`x2[1:10]`

`## [1] 2.2661 0.9260 0.6397 0.3697 0.8037 4.2529 2.6449 1.1851 0.4970 0.7945`

`summarize(x2)`

```
## $numerics
## x2
## 0% 0.027
## 25% 0.964
## 50% 1.714
## 75% 2.680
## 100% 10.756
## mean 1.991
## sd 1.377
## var 1.896
## NA's 0.000
## N 2000.000
##
## $factors
## NULL
```

I am not a) Poisson b) Normal c) Uniform d) Beta

I am not a) Poisson b) Logistic c) Uniform d) Beta

x2 <- rgamma(2000, 2, 1)

`x3[1:10]`

`## [1] 2 2 3 3 2 5 3 3 4 3`

`rockchalk::summarize(x3)`

```
## $numerics
## x3
## 0% 0.000
## 25% 2.000
## 50% 3.000
## 75% 4.000
## 100% 9.000
## mean 2.959
## sd 1.447
## var 2.094
## NA's 0.000
## N 2000.000
##
## $factors
## NULL
```

I am not: a) Poisson b) Normal c) Uniform d) Beta e) Binomial

I am not a) Poisson b) Normal c) Gamma d) Beta

Actually, I was rbinom(2000, 10, prob = c(0.3))

`drawHist(x1)`

`drawHist(x2)`

`drawHist(x3)`

- Expected Value

Probability weighted sum of outcomes

- Outcomes are 1, 2, 3
- Probabilities are 0.5, 0.4, and 0.1

Calculate the Expected Value: ?

0.5 * 1 + 0.4 * 2 + 0.1 * 3

Easy!

- Common, everyday meaning of “expected value”
- Is that ever what a lay person would “expect”?
- For which distributions is this “informative”? (unimodal, symmetric)
- When is it not subjectively informative? (others)

- Its still a fixed characteristic of the probability process
- Estimate it!
- Sometimes we engage in this game
- Stipulate a theoretical distribution
- Calculate the average from a sample
- Note the sample is “so far” from what we theorized about the EV that the theory must be wrong!

```
x1 <- rpois(5000, lambda = 0.2)
hist(x1, main = "My EV is 0.2")
```

```
x2 <- rpois(5000, lambda = 10)
hist(x2, main = "My EV is 10")
```

- please make note of these numbers

```
mean(x1)
mean(x2)
sd(x1)
sd(x2)
```

```
x1 <- rpois(5000, lambda = 0.2)
hist(x1, main = "My EV is 0.2")
```

- Sometimes I notice that R’s hist doesn’t work so well with discrete variables. Have ideas about that, but don’t want to bother you about them now.

```
x2 <- rpois(5000, lambda = 10)
hist(x2, main = "My EV is 10")
```

- Wedge those into same figure with par(mfcol = c(2,1))

- The mean

\[ \bar{x}=\frac{sum\ of\ observed\ values}{N} \] - Need criteria to say if \(\bar{x}\) is a “good”" estimate of \(E[x]\)

- Later in term
- unbiased

\(\bar{x}\) flutters around E[x]

- consistent

As your sample grows, gap between \(\bar{x}\) and \(E[x]\) will probably shrink

- This is a discrete variable, so we’d say it has “Probability Mass”, techincally different from a “continuum”
- Easiest math
- Probability of an outcome simply given to us
- Easy to see addition and multiplication rules.

- If you can put up with integrals, you can describe these things more clearly.
- PDF, f(x): probability density of a score equal to x
- CDF, F(k): cumulative distribution function. Chance of an observed score smaller than k

- Sketch some PDF (plays per fumble)
- Mark any point k.
- We almost never interested in question of a score exactly k
- Are interested in outcomes more extreme than k
- Imagine
- the distribution of what would happen
- the data gives you some estimate k
- we want to know if k is extremely different from model

- I am almost never interested in describing “the people out there from which we are sampling”
- I am interested in estimating the parameters of the process that generates them.

what did you get for \(mean(x1)\) before?

Sampling distribution

Expected value of \((x - E[x])^2\)

Can we calculate variance of {1, 2, 3} with probabilities {0.5, 0.4, and 0.1}?

- Var[x] is a characteristic of the probability process
Sample variance is an estimate of it.

Differentiate the “theoretical property” from the “sample estimate”

Expected Value <=> sample mean

There’s no special word for “population variance” similar to expected value (thus creating confusion when discussing variance)

- Here, the N and N-1 problem slaps us in the face.
- Intuition, a good formula appears to be \[s1 = \frac{Sum\ of (x - E[x])^2}{N}\]
- That is a maximum likelihood estimate

\(E[s1]\neq Var[x]\)

But this is unbiased

\[s1 = \frac{Sum\ of (x - E[x])^2}{N-1}\]