Multiple Linear Regression (MLR) Example in lavaan - 01

Ben Kite, Center for Research Methods and Data Analysis, University of Kansas <bakite@ku.edu>

Chong Xing, Center for Research Methods and Data Analysis, University of Kansas <cxing@ku.edu>

Please visit http://crmda.ku.edu/guides

Keywords: Linear Regression, Multiple Outcomes, R, lavaan


2019 January 24

Abstract

This guide outlines how to fit two linear regression models in R using the lavaan package. The results of the fisrt linear model (seven predictors and one outcome) fitted with lavaan are compared to the same model fitted with the lm function. The second model example demonstrates how to use lavaan to estimate a linear regression model with two outcome variables.

Package and Data

Load the lavaan package at the beginning of the session

library(lavaan)
This is lavaan 0.6-3
lavaan is BETA software! Please report any bugs.

The data file is read in, columns are named, and missing values are specified.

dat <- read.csv("../../data/job_placement.csv", header = FALSE)
colnames(dat) <- c("id", "wjcalc", "wjspl", "wratspl", "wratcalc",
                   "waiscalc", "waisspl", "edlevel", "newschl",
                   "suspend", "expelled", "haveld", "gender", "age")
dat[dat == 99999] <- NA
dat$gender <- as.factor(dat$gender)
levels(dat$gender) <- c("Male", "Female")

Regular Linear Regression Model

Using the base-R function “lm”, we can fit a regression model where wjspl is predicted by numerous observed variables.

lmout <- lm(wjspl ~ edlevel + newschl + suspend +
                    expelled + haveld + gender +
                    age, data = dat)
summary(lmout)

Call:
lm(formula = wjspl ~ edlevel + newschl + suspend + expelled + 
    haveld + gender + age, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-20.881  -4.555  -0.048   4.912  18.254 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  21.95323    4.47105   4.910 1.49e-06 ***
edlevel       1.16208    0.32851   3.537 0.000467 ***
newschl       0.06333    0.75673   0.084 0.933361    
suspend      -0.05158    0.78317  -0.066 0.947529    
expelled     -2.75762    1.11277  -2.478 0.013747 *  
haveld       -6.97447    1.00036  -6.972 1.94e-11 ***
genderFemale  0.72027    0.80269   0.897 0.370256    
age           0.41172    0.20920   1.968 0.049961 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.399 on 305 degrees of freedom
  (9 observations deleted due to missingness)
Multiple R-squared:  0.2207,    Adjusted R-squared:  0.2028 
F-statistic: 12.34 on 7 and 305 DF,  p-value: 6.64e-14

Linear Regression Model in lavaan

We can build the same model as before and save it as a character string for lavaan.

MLRModel <- 'wjspl ~ edlevel + newschl + suspend +
                     expelled + haveld + gender +
                     age

             wjspl ~ 1'

This fits the model using the sem function and returns an output summary. If we compare the results of this model to the previous model, we can see that the slope and intercept estimates are the same. In addition, “standardized = TRUE” command in summary() function will show standardized parameter estimates. In the next example, the regression model will include two outcome variables and the correlation between the outcomes is estimated - in this case a standardized correlation estimate is more intuitive to understand.

output <- sem(model = MLRModel, data = dat)
summary(output, standardized = TRUE)
lavaan 0.6-3 ended normally after 58 iterations

  Optimization method                           NLMINB
  Number of free parameters                          9

                                                  Used       Total
  Number of observations                           313         322

  Estimator                                         ML
  Model Fit Test Statistic                       0.000
  Degrees of freedom                                 0
  Minimum Function Value               0.0000000000000

Parameter Estimates:

  Information                                 Expected
  Information saturated (h1) model          Structured
  Standard Errors                             Standard

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  wjspl ~                                                               
    edlevel           1.162    0.324    3.584    0.000    1.162    0.197
    newschl           0.063    0.747    0.085    0.932    0.063    0.004
    suspend          -0.052    0.773   -0.067    0.947   -0.052   -0.004
    expelled         -2.758    1.098   -2.510    0.012   -2.758   -0.134
    haveld           -6.974    0.987   -7.063    0.000   -6.974   -0.354
    gender            0.720    0.792    0.909    0.363    0.720    0.047
    age               0.412    0.207    1.994    0.046    0.412    0.110

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .wjspl            21.233    4.596    4.620    0.000   21.233    2.967

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .wjspl            39.904    3.190   12.510    0.000   39.904    0.779

Linear Regression Model in lavaan - Adding another Outcome Variable

We can add more outcome variables to a regression model.

MLRModel.2 <- 'wjspl ~ edlevel + newschl + suspend +
                       expelled + haveld + gender +
                       age

               wjspl ~ 1

               wjcalc ~ edlevel + newschl + suspend +
                        expelled + haveld + gender +
                        age

               wjcalc ~ 1'
output.2 <- sem(model = MLRModel.2, data = dat)
summary(output.2, standardized = TRUE)
lavaan 0.6-3 ended normally after 118 iterations

  Optimization method                           NLMINB
  Number of free parameters                         19

                                                  Used       Total
  Number of observations                           311         322

  Estimator                                         ML
  Model Fit Test Statistic                       0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Information                                 Expected
  Information saturated (h1) model          Structured
  Standard Errors                             Standard

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  wjspl ~                                                               
    edlevel           1.179    0.325    3.624    0.000    1.179    0.199
    newschl           0.124    0.752    0.165    0.869    0.124    0.009
    suspend          -0.103    0.777   -0.132    0.895   -0.103   -0.007
    expelled         -2.736    1.100   -2.486    0.013   -2.736   -0.133
    haveld           -7.105    0.998   -7.120    0.000   -7.105   -0.358
    gender            0.634    0.800    0.792    0.428    0.634    0.041
    age               0.428    0.207    2.062    0.039    0.428    0.113
  wjcalc ~                                                              
    edlevel           1.295    0.208    6.225    0.000    1.295    0.340
    newschl           0.214    0.481    0.446    0.656    0.214    0.023
    suspend          -0.914    0.497   -1.840    0.066   -0.914   -0.099
    expelled         -0.302    0.704   -0.430    0.667   -0.302   -0.023
    haveld           -1.180    0.638   -1.848    0.065   -1.180   -0.092
    gender           -0.138    0.512   -0.269    0.788   -0.138   -0.014
    age               0.450    0.133    3.392    0.001    0.450    0.185

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .wjspl ~~                                                              
   .wjcalc           10.055    1.560    6.446    0.000   10.055    0.393

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .wjspl            20.832    4.619    4.510    0.000   20.832    2.903
   .wjcalc            1.277    2.955    0.432    0.666    1.277    0.277

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .wjspl            40.023    3.210   12.470    0.000   40.023    0.777
   .wjcalc           16.382    1.314   12.470    0.000   16.382    0.769

Session Info

R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lavaan_0.6-3        stationery_0.98.5.7

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0      digest_0.6.18   plyr_1.8.4      xtable_1.8-3   
 [5] magrittr_1.5    stats4_3.5.1    evaluate_0.12   zip_1.0.0      
 [9] stringi_1.2.4   pbivnorm_0.6.0  openxlsx_4.1.0  rmarkdown_1.11 
[13] tools_3.5.1     stringr_1.3.1   foreign_0.8-71  kutils_1.59    
[17] yaml_2.2.0      xfun_0.4        compiler_3.5.1  mnormt_1.5-5   
[21] htmltools_0.3.6 knitr_1.21     

Available under Created Commons license 3.0 CC BY