Ben Kite, Center for Research Methods and Data Analysis, University of Kansas <bakite@ku.edu>
Chong Xing, Center for Research Methods and Data Analysis, University of Kansas <cxing@ku.edu>
Please visit http://crmda.ku.edu/guides
Keywords: Linear Regression, Multiple Outcomes, R, lavaan
2019 January 24
Abstract
This guide outlines how to fit two linear regression models in R using the lavaan package. The results of the fisrt linear model (seven predictors and one outcome) fitted with lavaan are compared to the same model fitted with the lm function. The second model example demonstrates how to use lavaan to estimate a linear regression model with two outcome variables.Load the lavaan package at the beginning of the session
library(lavaan)
This is lavaan 0.6-3
lavaan is BETA software! Please report any bugs.
The data file is read in, columns are named, and missing values are specified.
dat <- read.csv("../../data/job_placement.csv", header = FALSE)
colnames(dat) <- c("id", "wjcalc", "wjspl", "wratspl", "wratcalc",
"waiscalc", "waisspl", "edlevel", "newschl",
"suspend", "expelled", "haveld", "gender", "age")
dat[dat == 99999] <- NA
dat$gender <- as.factor(dat$gender)
levels(dat$gender) <- c("Male", "Female")
Using the base-R function “lm”, we can fit a regression model where wjspl is predicted by numerous observed variables.
lmout <- lm(wjspl ~ edlevel + newschl + suspend +
expelled + haveld + gender +
age, data = dat)
summary(lmout)
Call:
lm(formula = wjspl ~ edlevel + newschl + suspend + expelled +
haveld + gender + age, data = dat)
Residuals:
Min 1Q Median 3Q Max
-20.881 -4.555 -0.048 4.912 18.254
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.95323 4.47105 4.910 1.49e-06 ***
edlevel 1.16208 0.32851 3.537 0.000467 ***
newschl 0.06333 0.75673 0.084 0.933361
suspend -0.05158 0.78317 -0.066 0.947529
expelled -2.75762 1.11277 -2.478 0.013747 *
haveld -6.97447 1.00036 -6.972 1.94e-11 ***
genderFemale 0.72027 0.80269 0.897 0.370256
age 0.41172 0.20920 1.968 0.049961 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.399 on 305 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.2207, Adjusted R-squared: 0.2028
F-statistic: 12.34 on 7 and 305 DF, p-value: 6.64e-14
We can build the same model as before and save it as a character string for lavaan.
MLRModel <- 'wjspl ~ edlevel + newschl + suspend +
expelled + haveld + gender +
age
wjspl ~ 1'
This fits the model using the sem function and returns an output summary. If we compare the results of this model to the previous model, we can see that the slope and intercept estimates are the same. In addition, “standardized = TRUE” command in summary() function will show standardized parameter estimates. In the next example, the regression model will include two outcome variables and the correlation between the outcomes is estimated - in this case a standardized correlation estimate is more intuitive to understand.
output <- sem(model = MLRModel, data = dat)
summary(output, standardized = TRUE)
lavaan 0.6-3 ended normally after 58 iterations
Optimization method NLMINB
Number of free parameters 9
Used Total
Number of observations 313 322
Estimator ML
Model Fit Test Statistic 0.000
Degrees of freedom 0
Minimum Function Value 0.0000000000000
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard Errors Standard
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
wjspl ~
edlevel 1.162 0.324 3.584 0.000 1.162 0.197
newschl 0.063 0.747 0.085 0.932 0.063 0.004
suspend -0.052 0.773 -0.067 0.947 -0.052 -0.004
expelled -2.758 1.098 -2.510 0.012 -2.758 -0.134
haveld -6.974 0.987 -7.063 0.000 -6.974 -0.354
gender 0.720 0.792 0.909 0.363 0.720 0.047
age 0.412 0.207 1.994 0.046 0.412 0.110
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.wjspl 21.233 4.596 4.620 0.000 21.233 2.967
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.wjspl 39.904 3.190 12.510 0.000 39.904 0.779
We can add more outcome variables to a regression model.
MLRModel.2 <- 'wjspl ~ edlevel + newschl + suspend +
expelled + haveld + gender +
age
wjspl ~ 1
wjcalc ~ edlevel + newschl + suspend +
expelled + haveld + gender +
age
wjcalc ~ 1'
output.2 <- sem(model = MLRModel.2, data = dat)
summary(output.2, standardized = TRUE)
lavaan 0.6-3 ended normally after 118 iterations
Optimization method NLMINB
Number of free parameters 19
Used Total
Number of observations 311 322
Estimator ML
Model Fit Test Statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard Errors Standard
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
wjspl ~
edlevel 1.179 0.325 3.624 0.000 1.179 0.199
newschl 0.124 0.752 0.165 0.869 0.124 0.009
suspend -0.103 0.777 -0.132 0.895 -0.103 -0.007
expelled -2.736 1.100 -2.486 0.013 -2.736 -0.133
haveld -7.105 0.998 -7.120 0.000 -7.105 -0.358
gender 0.634 0.800 0.792 0.428 0.634 0.041
age 0.428 0.207 2.062 0.039 0.428 0.113
wjcalc ~
edlevel 1.295 0.208 6.225 0.000 1.295 0.340
newschl 0.214 0.481 0.446 0.656 0.214 0.023
suspend -0.914 0.497 -1.840 0.066 -0.914 -0.099
expelled -0.302 0.704 -0.430 0.667 -0.302 -0.023
haveld -1.180 0.638 -1.848 0.065 -1.180 -0.092
gender -0.138 0.512 -0.269 0.788 -0.138 -0.014
age 0.450 0.133 3.392 0.001 0.450 0.185
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.wjspl ~~
.wjcalc 10.055 1.560 6.446 0.000 10.055 0.393
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.wjspl 20.832 4.619 4.510 0.000 20.832 2.903
.wjcalc 1.277 2.955 0.432 0.666 1.277 0.277
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.wjspl 40.023 3.210 12.470 0.000 40.023 0.777
.wjcalc 16.382 1.314 12.470 0.000 16.382 0.769
R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lavaan_0.6-3 stationery_0.98.5.7
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 digest_0.6.18 plyr_1.8.4 xtable_1.8-3
[5] magrittr_1.5 stats4_3.5.1 evaluate_0.12 zip_1.0.0
[9] stringi_1.2.4 pbivnorm_0.6.0 openxlsx_4.1.0 rmarkdown_1.11
[13] tools_3.5.1 stringr_1.3.1 foreign_0.8-71 kutils_1.59
[17] yaml_2.2.0 xfun_0.4 compiler_3.5.1 mnormt_1.5-5
[21] htmltools_0.3.6 knitr_1.21
Available under Created Commons license 3.0