Visualize (Whirled Peas)

Quick PDF sketch

Normal (too easy)
Gamma
Beta
Logistic
Poisson
Binomial

Guess that Distribution

I’m Not that

length(x1)

## [1] 2000

x1[1:10]

##  [1]  178.86 -225.95  371.74  133.33  311.81  177.53  368.16  363.94
##  [9]   73.85  335.23

rockchalk::summarize(x1)

## $numerics
##            x1
## 0%    -549.42
## 25%    -49.50
## 50%     87.25
## 75%    223.28
## 100%   757.48
## mean    84.84
## sd     196.60
## var  38653.11
## NA's     0.00
## N     2000.00
## 
## $factors
## NULL

I am not: a) Poisson b) Normal c) Uniform d) Beta e) Binomial

I’m Not that

plot of chunk unnamed-chunk-3

I am not a) Poisson b) Normal c) Gamma d) Beta e) Binomial

Answer:

x1 <- rnorm(2000, 88, 200)

I’m Not that 2

x2[1:10]

##  [1] 2.2661 0.9260 0.6397 0.3697 0.8037 4.2529 2.6449 1.1851 0.4970 0.7945

summarize(x2)

## $numerics
##            x2
## 0%      0.027
## 25%     0.964
## 50%     1.714
## 75%     2.680
## 100%   10.756
## mean    1.991
## sd      1.377
## var     1.896
## NA's    0.000
## N    2000.000
## 
## $factors
## NULL

I am not a) Poisson b) Normal c) Uniform d) Beta

I’m Not that 2

plot of chunk unnamed-chunk-7

I am not a) Poisson b) Logistic c) Uniform d) Beta

Answer

x2 <- rgamma(2000, 2, 1)

I’m Not that 3

x3[1:10]

##  [1] 2 2 3 3 2 5 3 3 4 3

rockchalk::summarize(x3)

## $numerics
##            x3
## 0%      0.000
## 25%     2.000
## 50%     3.000
## 75%     4.000
## 100%    9.000
## mean    2.959
## sd      1.447
## var     2.094
## NA's    0.000
## N    2000.000
## 
## $factors
## NULL

I am not: a) Poisson b) Normal c) Uniform d) Beta e) Binomial

I’m Not that

plot of chunk unnamed-chunk-11 I am not a) Poisson b) Normal c) Gamma d) Beta

Answer:

Actually, I was rbinom(2000, 10, prob = c(0.3))

Harvesting from R/WorkingExamples

plot-histogramWithLinesAndLegend.R(html)

drawHist(x1)

plot of chunk unnamed-chunk-12

ex 2

drawHist(x2)

plot of chunk unnamed-chunk-13

ex 3

drawHist(x3)

plot of chunk unnamed-chunk-14

EV: Simple idea with complicated jargon

Expected Value

Probability weighted sum of outcomes

Outcomes are 1, 2, 3
Probabilities are 0.5, 0.4, and 0.1

Calculate the Expected Value: ?

0.5 * 1 + 0.4 * 2 + 0.1 * 3

Easy!

Do you think terminology is the major problem

Common, everyday meaning of “expected value”
Is that ever what a lay person would “expect”?
For which distributions is this “informative”? (unimodal, symmetric)
When is it not subjectively informative? (others)

Even if EV is not subjectively informative…

Its still a fixed characteristic of the probability process
Estimate it!
Sometimes we engage in this game
- Stipulate a theoretical distribution
- Calculate the average from a sample
- Note the sample is “so far” from what we theorized about the EV that the theory must be wrong!

Got R?

x1 <- rpois(5000, lambda = 0.2)
hist(x1, main = "My EV is 0.2")

x2 <- rpois(5000, lambda = 10)
hist(x2, main = "My EV is 10")

please make note of these numbers

mean(x1)
mean(x2)
sd(x1)
sd(x2)

I got

x1 <- rpois(5000, lambda = 0.2)
hist(x1, main = "My EV is 0.2")

plot of chunk unnamed-chunk-15

Sometimes I notice that R’s hist doesn’t work so well with discrete variables. Have ideas about that, but don’t want to bother you about them now.

x2 <- rpois(5000, lambda = 10)
hist(x2, main = "My EV is 10")

plot of chunk unnamed-chunk-16

Wedge those into same figure with par(mfcol = c(2,1))

plot of chunk unnamed-chunk-17

Sample average is one way to estimate EV

The mean

\[ \bar{x}=\frac{sum\ of\ observed\ values}{N} \] - Need criteria to say if \(\bar{x}\) is a “good”" estimate of \(E[x]\)

Later in term
- unbiased
\(\bar{x}\) flutters around E[x]
- consistent
As your sample grows, gap between \(\bar{x}\) and \(E[x]\) will probably shrink

Probabilities. Who Likes Calculus?

This is a discrete variable, so we’d say it has “Probability Mass”, techincally different from a “continuum”
Easiest math
Probability of an outcome simply given to us
Easy to see addition and multiplication rules.

Continuous variables

If you can put up with integrals, you can describe these things more clearly.
- PDF, f(x): probability density of a score equal to x
- CDF, F(k): cumulative distribution function. Chance of an observed score smaller than k

Extreme scores

Sketch some PDF (plays per fumble)
Mark any point k.
We almost never interested in question of a score exactly k
Are interested in outcomes more extreme than k
Imagine
- the distribution of what would happen
- the data gives you some estimate k
- we want to know if k is extremely different from model

I hate the word population

Data Generating Process

I am almost never interested in describing “the people out there from which we are sampling”
I am interested in estimating the parameters of the process that generates them.

I want the distribution of an estimate

what did you get for \(mean(x1)\) before?
Sampling distribution

Theoretical Variance

Variance

Expected value of \((x - E[x])^2\)
Can we calculate variance of {1, 2, 3} with probabilities {0.5, 0.4, and 0.1}?
Var[x] is a characteristic of the probability process
Sample variance is an estimate of it.

Terminology

Differentiate the “theoretical property” from the “sample estimate”
Expected Value <=> sample mean
There’s no special word for “population variance” similar to expected value (thus creating confusion when discussing variance)

Distribution Overview

Visualize (Whirled Peas)

Guess that Distribution

I’m Not that

I’m Not that

Answer:

I’m Not that 2

I’m Not that 2

Answer

I’m Not that 3

I’m Not that

Answer:

Harvesting from R/WorkingExamples

plot-histogramWithLinesAndLegend.R(html)

ex 2

ex 3

Expected Values

EV: Simple idea with complicated jargon

Do you think terminology is the major problem

Even if EV is not subjectively informative…

Got R?

I got

Sample average is one way to estimate EV

Probabilities. Who Likes Calculus?

Continuous variables

Extreme scores

I hate the word population

Data Generating Process

I want the distribution of an estimate

Theoretical Variance

Variance

Terminology

How to calculate sample variance estimates

But it is not unbiased.