Paul E. Johnson
2015-02-18
Draw this
\[ y = 2 + 1 x \]
Set the range of x from 0 to 10
Draw this
\[ y = 2 + 1 x \]
\[ y = 4 + 1 x \]
What is the difference of the two lines?
Can you tell me a story about some relationship that might “look like that”?
Draw
\[ y = 2 + 1 x \]
\[ y = 2 + 0 x \]
Personal Anxiety Alert!
Some people think \(R^2\) is summary of goodness in a regression (“model fit index”)
Clearly the regression model \(y_i=\beta_0 + \beta_{1} x_i + e_i\) is linear, but
That’s not what they mean when they say it is a linear estimator.
\[ \hat{\beta}_{1}^{OLS} = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]
\[ \hat{\beta}_{1}^{OLS} =\sum_{i=1}^{N}\left(\frac{x_{i}-\bar{x}}{\sum(x_{i}-\bar{x})^{2}}\times y_{_{i}}\right) \]
\[ SS_x = \sum (x_i - \bar{x})^2 \]
These are weights, applied one-by-one in the sum.
Hence, \(\hat{\beta}\) is a weighted sum of the observed scores \[ h_1 \times (y_1 - \bar{y}) + h_2 \times (y_2 - \bar{y}) \]
In the paradigm that we are using, \(x_i\) is “fixed”, its a non random thing. The only observed random variable is \(y_i\).
That’s calculated from the predictors, we could cycle through the data, row by row, and calculate that number.
\[ \hat{\beta}_{1}^{OLS}=\frac{\sum x_{i}y_{i}}{\sum x_{i}^{2}} \]
\[ \hat{\beta}_{1}^{OLS} = \sum\left(\frac{x_{i}}{\sum x_{i}{}^{2}}\times y_{i}\right) \]
\[ \hat{\beta}_{1}^{OLS} = \sum\left(\frac{x_{i}}{SS_x}\times y_{i}\right) \]
\[ \hat{\beta}_{1}^{OLS} = \sum\left(h_i \times y_{i}\right) \]
Comments
I’m not certain we should teach “correlation” at all, by itself
Correlation speaks to a certain “prejudice” about relationships that I don’t hold