Ben Kite, Center for Research Methods and Data Analysis, University of Kansas <bakite@ku.edu>
Please visit http://crmda.ku.edu/guides
Keywords: SEM, CFA, R, lavaan
Abstract
This guide outlines how to conduct a confirmatory factor analysis in R using the lavaan package. A basic example with 6 manifest variables measuring two latent factors is presented. The model estimation results can be compared to the same model fitted with Mplus.
If you have not already installed the lavaan package, you will need to do that first. Here the command to install the package is commented out (preceded by “##”) because the package is already installed. The lavaan package needs to be loaded by using the “library(lavaan)” command.
##install.packages("lavaan")
library(lavaan)
This is lavaan 0.6-3
lavaan is BETA software! Please report any bugs.
##install.packages("kutils")
library(kutils)
##install.packages("semPlot")
library(semPlot)
The data file “job_placement” needs to be read in to the R session.
dat <- read.csv("../../data/job_placement.csv", header = FALSE)
Because the datafile does not have column (or variable) names, the variable names need to be specified.
colnames(dat) <- c("id", "wjcalc", "wjspl", "wratspl", "wratcalc", "waiscalc", "waisspl",
"edlevel", "newschl", "suspend", "expelled", "haveld", "female", "age")
In the original data file the missing values are coded as “99999”. These values need to be recoded to NA so that R recognizes them as missing.
dat[dat == 99999] <- NA
Now the CFA model that is to be tested needs to be specified as an R object. In lavaan the =~
operator is the exact same as the BY
operator in Mplus. In the statements below the variables on the right of the =~
are indicators of the variable to the left of the =~
. Also notice that there are +
signs that separate the variables on the right side of the equation. The entire model statement needs to be wrapped in quotation marks.
CFAmodel <- 'MATH =~ wratcalc + wjcalc + waiscalc
SPELL =~ wratspl + wjspl + waisspl'
Once the model is built, it can be fit to the data using the cfa
function. The model object is the CFAmodel object that was specified above. The data object is the dat object that was imported and altered above. The last command is used to set the scale for the latent variable. The default value for std.lv
is FALSE
, by setting it to TRUE
the first factor loading for each latent variable is freely estimated, the mean for each latent variable is set to 0, and the variance for each latent variable is set to 1.
CFAmodel.fitted <- cfa(model = CFAmodel, data = dat, std.lv = TRUE,
missing = "fiml", mimic = "Mplus")
summary(CFAmodel.fitted)
lavaan 0.6-3 ended normally after 50 iterations
Optimization method NLMINB
Number of free parameters 19
Number of observations 322
Number of missing patterns 4
Estimator ML
Model Fit Test Statistic 9.540
Degrees of freedom 8
P-value (Chi-square) 0.299
Parameter Estimates:
Information Observed
Observed information based on Hessian
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|)
MATH =~
wratcalc 6.041 0.276 21.921 0.000
wjcalc 4.144 0.203 20.370 0.000
waiscalc 2.410 0.165 14.636 0.000
SPELL =~
wratspl 6.532 0.288 22.645 0.000
wjspl 6.809 0.296 23.025 0.000
waisspl 6.354 0.283 22.463 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
MATH ~~
SPELL 0.553 0.042 13.191 0.000
Intercepts:
Estimate Std.Err z-value P(>|z|)
.wratcalc 38.922 0.355 109.514 0.000
.wjcalc 23.812 0.255 93.297 0.000
.waiscalc 11.022 0.186 59.230 0.000
.wratspl 36.484 0.385 94.751 0.000
.wjspl 41.674 0.398 104.808 0.000
.waisspl 37.163 0.376 98.788 0.000
MATH 0.000
SPELL 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.wratcalc 4.179 1.014 4.122 0.000
.wjcalc 3.769 0.537 7.017 0.000
.waiscalc 5.304 0.457 11.596 0.000
.wratspl 5.053 0.612 8.256 0.000
.wjspl 4.541 0.616 7.369 0.000
.waisspl 5.156 0.599 8.605 0.000
MATH 1.000
SPELL 1.000
We can request standardized output columns by adding standardized = TRUE
. The Std.lv
column is the parameter estimates when just latent variables are standardized (they already were standardized, so this column tells us nothing new), whereas the Std.all
column is the parameter estimates when all variables are standardized.
summary(CFAmodel.fitted, standardized = TRUE)
lavaan 0.6-3 ended normally after 50 iterations
Optimization method NLMINB
Number of free parameters 19
Number of observations 322
Number of missing patterns 4
Estimator ML
Model Fit Test Statistic 9.540
Degrees of freedom 8
P-value (Chi-square) 0.299
Parameter Estimates:
Information Observed
Observed information based on Hessian
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
MATH =~
wratcalc 6.041 0.276 21.921 0.000 6.041 0.947
wjcalc 4.144 0.203 20.370 0.000 4.144 0.906
waiscalc 2.410 0.165 14.636 0.000 2.410 0.723
SPELL =~
wratspl 6.532 0.288 22.645 0.000 6.532 0.946
wjspl 6.809 0.296 23.025 0.000 6.809 0.954
waisspl 6.354 0.283 22.463 0.000 6.354 0.942
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
MATH ~~
SPELL 0.553 0.042 13.191 0.000 0.553 0.553
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.wratcalc 38.922 0.355 109.514 0.000 38.922 6.103
.wjcalc 23.812 0.255 93.297 0.000 23.812 5.203
.waiscalc 11.022 0.186 59.230 0.000 11.022 3.306
.wratspl 36.484 0.385 94.751 0.000 36.484 5.282
.wjspl 41.674 0.398 104.808 0.000 41.674 5.841
.waisspl 37.163 0.376 98.788 0.000 37.163 5.508
MATH 0.000 0.000 0.000
SPELL 0.000 0.000 0.000
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.wratcalc 4.179 1.014 4.122 0.000 4.179 0.103
.wjcalc 3.769 0.537 7.017 0.000 3.769 0.180
.waiscalc 5.304 0.457 11.596 0.000 5.304 0.477
.wratspl 5.053 0.612 8.256 0.000 5.053 0.106
.wjspl 4.541 0.616 7.369 0.000 4.541 0.089
.waisspl 5.156 0.599 8.605 0.000 5.156 0.113
MATH 1.000 1.000 1.000
SPELL 1.000 1.000 1.000
We can request the fit measures along with the parameter estimates. This makes the output resemble what is returned by Mplus.
summary(CFAmodel.fitted, standardized = TRUE, fit = TRUE)
lavaan 0.6-3 ended normally after 50 iterations
Optimization method NLMINB
Number of free parameters 19
Number of observations 322
Number of missing patterns 4
Estimator ML
Model Fit Test Statistic 9.540
Degrees of freedom 8
P-value (Chi-square) 0.299
Model test baseline model:
Minimum Function Test Statistic 1882.335
Degrees of freedom 15
P-value 0.000
User model versus baseline model:
Comparative Fit Index (CFI) 0.999
Tucker-Lewis Index (TLI) 0.998
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -5127.830
Loglikelihood unrestricted model (H1) -5123.060
Number of free parameters 19
Akaike (AIC) 10293.661
Bayesian (BIC) 10365.377
Sample-size adjusted Bayesian (BIC) 10305.112
Root Mean Square Error of Approximation:
RMSEA 0.024
90 Percent Confidence Interval 0.000 0.073
P-value RMSEA <= 0.05 0.761
Standardized Root Mean Square Residual:
SRMR 0.024
Parameter Estimates:
Information Observed
Observed information based on Hessian
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
MATH =~
wratcalc 6.041 0.276 21.921 0.000 6.041 0.947
wjcalc 4.144 0.203 20.370 0.000 4.144 0.906
waiscalc 2.410 0.165 14.636 0.000 2.410 0.723
SPELL =~
wratspl 6.532 0.288 22.645 0.000 6.532 0.946
wjspl 6.809 0.296 23.025 0.000 6.809 0.954
waisspl 6.354 0.283 22.463 0.000 6.354 0.942
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
MATH ~~
SPELL 0.553 0.042 13.191 0.000 0.553 0.553
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.wratcalc 38.922 0.355 109.514 0.000 38.922 6.103
.wjcalc 23.812 0.255 93.297 0.000 23.812 5.203
.waiscalc 11.022 0.186 59.230 0.000 11.022 3.306
.wratspl 36.484 0.385 94.751 0.000 36.484 5.282
.wjspl 41.674 0.398 104.808 0.000 41.674 5.841
.waisspl 37.163 0.376 98.788 0.000 37.163 5.508
MATH 0.000 0.000 0.000
SPELL 0.000 0.000 0.000
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.wratcalc 4.179 1.014 4.122 0.000 4.179 0.103
.wjcalc 3.769 0.537 7.017 0.000 3.769 0.180
.waiscalc 5.304 0.457 11.596 0.000 5.304 0.477
.wratspl 5.053 0.612 8.256 0.000 5.053 0.106
.wjspl 4.541 0.616 7.369 0.000 4.541 0.089
.waisspl 5.156 0.599 8.605 0.000 5.156 0.113
MATH 1.000 1.000 1.000
SPELL 1.000 1.000 1.000
What if we only want certain fit measures? Those can be assigned and viewed using the commands below.
fitStats <- fitMeasures(CFAmodel.fitted, c("cfi", "rmsea"))
fitStats
cfi rmsea
0.999 0.024
We can pull out the parameter estimates instead of just looking at the summary. This can be useful when putting the parameter estimates in a table.
paramEsts <- parameterEstimates(CFAmodel.fitted)
paramEsts
lhs op rhs est se z pvalue ci.lower ci.upper
1 MATH =~ wratcalc 6.041 0.276 21.921 0 5.501 6.581
2 MATH =~ wjcalc 4.144 0.203 20.370 0 3.745 4.543
3 MATH =~ waiscalc 2.410 0.165 14.636 0 2.088 2.733
4 SPELL =~ wratspl 6.532 0.288 22.645 0 5.967 7.097
5 SPELL =~ wjspl 6.809 0.296 23.025 0 6.230 7.389
6 SPELL =~ waisspl 6.354 0.283 22.463 0 5.799 6.908
7 wratcalc ~~ wratcalc 4.179 1.014 4.122 0 2.192 6.166
8 wjcalc ~~ wjcalc 3.769 0.537 7.017 0 2.716 4.821
9 waiscalc ~~ waiscalc 5.304 0.457 11.596 0 4.407 6.200
10 wratspl ~~ wratspl 5.053 0.612 8.256 0 3.853 6.253
11 wjspl ~~ wjspl 4.541 0.616 7.369 0 3.333 5.749
12 waisspl ~~ waisspl 5.156 0.599 8.605 0 3.982 6.330
13 MATH ~~ MATH 1.000 0.000 NA NA 1.000 1.000
14 SPELL ~~ SPELL 1.000 0.000 NA NA 1.000 1.000
15 MATH ~~ SPELL 0.553 0.042 13.191 0 0.470 0.635
16 wratcalc ~1 38.922 0.355 109.514 0 38.226 39.619
17 wjcalc ~1 23.812 0.255 93.297 0 23.311 24.312
18 waiscalc ~1 11.022 0.186 59.230 0 10.657 11.386
19 wratspl ~1 36.484 0.385 94.751 0 35.730 37.239
20 wjspl ~1 41.674 0.398 104.808 0 40.895 42.453
21 waisspl ~1 37.163 0.376 98.788 0 36.426 37.900
22 MATH ~1 0.000 0.000 NA NA 0.000 0.000
23 SPELL ~1 0.000 0.000 NA NA 0.000 0.000
We can also pull out the factor scores by using the lavPredict
function on the testOutput object.
fscores <- lavPredict(CFAmodel.fitted)
head(fscores)
MATH SPELL
[1,] 1.378216628 1.3727471
[2,] 1.812068411 2.0974157
[3,] -0.006094102 0.1972874
[4,] 1.400963259 -0.2650847
[5,] 1.768911802 -0.7860667
[6,] 0.549146289 0.3652424
The semPaths
function from the semPlot package provides an easy way to produce a path diagram.
library(semPlot)
semPaths(CFAmodel.fitted)
We can change some of the settings to make the path diagram into something we like a bit more.
semPaths(CFAmodel.fitted, what = "paths", intercepts = FALSE, sizeMan = 12, sizeMan2 = 8,
layout = "tree2", sizeLat = 20, sizeLat2 = 10, width = 5, height = 1,
label.cex = 1, nCharNodes = 0, curve = 2.5, label.scale = FALSE)
We can ask for the parameter estimates to be placed on the paths directly with the whatLabels = "est"
argument. We also use the “edge.label.cex = 1.2” argument to make the values larger than the default.
semPaths(CFAmodel.fitted, what = "paths", whatLabels = "est", intercepts = FALSE,
sizeMan = 12, sizeMan2 = 8,layout = "tree2", sizeLat = 20, sizeLat2 = 10,
width = 5, height = 3, label.cex = 1, nCharNodes = 0, curve = 2.5,
label.scale = FALSE, edge.label.cex = 1.2)
Lastly, the kutils function semTable
can be used to produce a table summarizing the model parameter estimates in latex or html code. In this example we produce an html table that is automatically built by Rmarkdown.
semTable(CFAmodel.fitted, type = "html")
In R your output will look like this:
<table style="padding-right:20px;padding-left:20px;">
<tr><td></td><td colspan = '4'; align = 'center'>Model</td></tr>
<tr><td></td><td colspan = '1'; align = 'center'>Estimate</td><td colspan = '1'; align = 'center'>Std. Err.</td><td colspan = '1'; align = 'center'>z</td><td colspan = '1'; align = 'center'>p</td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Factor Loadings</span></td></tr> <tr><td colspan = '1'; align = 'left'><span style="text-decoration: underline;">MATH</span></td></tr>
<tr><td>wratcalc</td><td>6.04</td><td>0.28</td><td>21.92</td><td>.000</td></tr>
<tr><td>wjcalc</td><td>4.14</td><td>0.20</td><td>20.37</td><td>.000</td></tr>
<tr><td>waiscalc</td><td>2.41</td><td>0.16</td><td>14.64</td><td>.000</td></tr>
<tr><td colspan = '1'; align = 'left'><span style="text-decoration: underline;">SPELL</span></td></tr>
<tr><td>wratspl</td><td>6.53</td><td>0.29</td><td>22.64</td><td>.000</td></tr>
<tr><td>wjspl</td><td>6.81</td><td>0.30</td><td>23.03</td><td>.000</td></tr>
<tr><td>waisspl</td><td>6.35</td><td>0.28</td><td>22.46</td><td>.000</td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Intercepts</span></td></tr>
<tr><td>wratcalc</td><td>38.92</td><td>0.36</td><td>109.51</td><td>.000</td></tr>
<tr><td>wjcalc</td><td>23.81</td><td>0.26</td><td>93.30</td><td>.000</td></tr>
<tr><td>waiscalc</td><td>11.02</td><td>0.19</td><td>59.23</td><td>.000</td></tr>
<tr><td>wratspl</td><td>36.48</td><td>0.39</td><td>94.75</td><td>.000</td></tr>
<tr><td>wjspl</td><td>41.67</td><td>0.40</td><td>104.81</td><td>.000</td></tr>
<tr><td>waisspl</td><td>37.16</td><td>0.38</td><td>98.79</td><td>.000</td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Residual Variances</span></td></tr>
<tr><td>wratcalc</td><td>4.18</td><td>1.01</td><td>4.12</td><td>.000</td></tr>
<tr><td>wjcalc</td><td>3.77</td><td>0.54</td><td>7.02</td><td>.000</td></tr>
<tr><td>waiscalc</td><td>5.30</td><td>0.46</td><td>11.60</td><td>.000</td></tr>
<tr><td>wratspl</td><td>5.05</td><td>0.61</td><td>8.26</td><td>.000</td></tr>
<tr><td>wjspl</td><td>4.54</td><td>0.62</td><td>7.37</td><td>.000</td></tr>
<tr><td>waisspl</td><td>5.16</td><td>0.60</td><td>8.60</td><td>.000</td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Latent Intercepts</span></td></tr>
<tr><td>MATH</td><td>0.00<sup>+</sup></td><td></td><td></td><td></td></tr>
<tr><td>SPELL</td><td>0.00<sup>+</sup></td><td></td><td></td><td></td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Latent Variances</span></td></tr>
<tr><td>MATH</td><td>1.00<sup>+</sup></td><td></td><td></td><td></td></tr>
<tr><td>SPELL</td><td>1.00<sup>+</sup></td><td></td><td></td><td></td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Latent Covariances</span></td></tr>
<tr><td>MATH w/SPELL</td><td>0.55</td><td>0.04</td><td>13.19</td><td>.000</td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Fit Indices</span></td></tr>
<tr><td>χ<sup>2</sup></td><td>9.54(8)</td><td></td><td></td><td>.299</td></tr>
<tr><td>CFI</td><td>1.00</td><td></td><td></td><td></td></tr>
<tr><td>TLI</td><td>1.00</td><td></td><td></td><td></td></tr>
<tr><td>RMSEA</td><td>0.02</td><td></td><td></td><td></td></tr>
<tr><td colspan = '5'; align = 'left'><sup>+</sup>Fixed parameter</td></tr>
</table><br>
If the file
argument is used to specify a file to be saved, opening the resulting html file in a web browser will show a formatted table.
semTable(CFAmodel.fitted, type = "html", file = "CFAmodel.fitted.html")
Model | ||||
Estimate | Std. Err. | z | p | |
Factor Loadings | ||||
MATH | ||||
wratcalc | 6.04 | 0.28 | 21.92 | .000 |
wjcalc | 4.14 | 0.20 | 20.37 | .000 |
waiscalc | 2.41 | 0.16 | 14.64 | .000 |
SPELL | ||||
wratspl | 6.53 | 0.29 | 22.64 | .000 |
wjspl | 6.81 | 0.30 | 23.03 | .000 |
waisspl | 6.35 | 0.28 | 22.46 | .000 |
Intercepts | ||||
wratcalc | 38.92 | 0.36 | 109.51 | .000 |
wjcalc | 23.81 | 0.26 | 93.30 | .000 |
waiscalc | 11.02 | 0.19 | 59.23 | .000 |
wratspl | 36.48 | 0.39 | 94.75 | .000 |
wjspl | 41.67 | 0.40 | 104.81 | .000 |
waisspl | 37.16 | 0.38 | 98.79 | .000 |
Residual Variances | ||||
wratcalc | 4.18 | 1.01 | 4.12 | .000 |
wjcalc | 3.77 | 0.54 | 7.02 | .000 |
waiscalc | 5.30 | 0.46 | 11.60 | .000 |
wratspl | 5.05 | 0.61 | 8.26 | .000 |
wjspl | 4.54 | 0.62 | 7.37 | .000 |
waisspl | 5.16 | 0.60 | 8.60 | .000 |
Latent Intercepts | ||||
MATH | 0.00+ | |||
SPELL | 0.00+ | |||
Latent Variances | ||||
MATH | 1.00+ | |||
SPELL | 1.00+ | |||
Latent Covariances | ||||
MATH w/SPELL | 0.55 | 0.04 | 13.19 | .000 |
Fit Indices | ||||
χ2 | 9.54(8) | .299 | ||
CFI | 1.00 | |||
TLI | 1.00 | |||
RMSEA | 0.02 | |||
+Fixed parameter |
Please click cfa-01.html if the reader would like to see the same CFA model fitted with Mplus.
R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] semPlot_1.1 kutils_1.62 lavaan_0.6-3
[4] stationery_0.98.5.7
loaded via a namespace (and not attached):
[1] nlme_3.1-137 RColorBrewer_1.1-2 mi_1.0
[4] tools_3.5.1 backports_1.1.3 R6_2.3.0
[7] rpart_4.1-13 d3Network_0.5.2.1 Hmisc_4.2-0
[10] lazyeval_0.2.1 colorspace_1.4-0 nnet_7.3-12
[13] tidyselect_0.2.5 gridExtra_2.3 mnormt_1.5-5
[16] compiler_3.5.1 qgraph_1.6 fdrtool_1.2.15
[19] htmlTable_1.13.1 scales_1.0.0 checkmate_1.9.1
[22] psych_1.8.12 pbapply_1.4-0 sem_3.1-9
[25] stringr_1.3.1 digest_0.6.18 pbivnorm_0.6.0
[28] foreign_0.8-71 minqa_1.2.4 rmarkdown_1.11
[31] base64enc_0.1-3 jpeg_0.1-8 pkgconfig_2.0.2
[34] htmltools_0.3.6 lme4_1.1-20 lisrelToR_0.1.4
[37] htmlwidgets_1.3 rlang_0.3.1 rstudioapi_0.9.0
[40] huge_1.2.7 bindr_0.1.1 gtools_3.8.1
[43] acepack_1.4.1 dplyr_0.7.8 zip_1.0.0
[46] magrittr_1.5 OpenMx_2.12.1 Formula_1.2-3
[49] Matrix_1.2-15 Rcpp_1.0.0 munsell_0.5.0
[52] abind_1.4-5 rockchalk_1.8.130 stringi_1.2.4
[55] whisker_0.3-2 yaml_2.2.0 carData_3.0-2
[58] MASS_7.3-51.1 plyr_1.8.4 matrixcalc_1.0-3
[61] grid_3.5.1 parallel_3.5.1 crayon_1.3.4
[64] lattice_0.20-38 splines_3.5.1 knitr_1.21
[67] pillar_1.3.1 igraph_1.2.2 boot_1.3-20
[70] rjson_0.2.20 corpcor_1.6.9 BDgraph_2.54
[73] reshape2_1.4.3 stats4_3.5.1 XML_3.98-1.16
[76] glue_1.3.0 evaluate_0.12 latticeExtra_0.6-28
[79] data.table_1.12.0 png_0.1-7 nloptr_1.2.1
[82] gtable_0.2.0 purrr_0.3.0 assertthat_0.2.0
[85] ggplot2_3.1.0 xfun_0.4 openxlsx_4.1.0
[88] xtable_1.8-3 semTools_0.5-1 coda_0.19-2
[91] survival_2.43-3 glasso_1.10 tibble_2.0.1
[94] arm_1.10-1 ggm_2.3 bindrcpp_0.2.2
[97] cluster_2.0.7-1
Available under Created Commons license 3.0