Confirmatory Factor Analysis (CFA) Example in R

Package Installation

If you have not already installed the lavaan package, you will need to do that first. Here the command to install the package is commented out (preceded by “##”) because the package is already installed. The lavaan package needs to be loaded by using the “library(lavaan)” command.

##install.packages("lavaan")
library(lavaan)

This is lavaan 0.6-3

lavaan is BETA software! Please report any bugs.

##install.packages("kutils")
library(kutils)
##install.packages("semPlot")
library(semPlot)

Data Importation and Naming

The data file “job_placement” needs to be read in to the R session.

dat <- read.csv("../../data/job_placement.csv", header = FALSE)

Because the datafile does not have column (or variable) names, the variable names need to be specified.

colnames(dat) <- c("id", "wjcalc", "wjspl", "wratspl", "wratcalc", "waiscalc", "waisspl",
                   "edlevel", "newschl", "suspend", "expelled", "haveld", "female", "age")

In the original data file the missing values are coded as “99999”. These values need to be recoded to NA so that R recognizes them as missing.

dat[dat == 99999] <- NA

Model Building and Fitting

Now the CFA model that is to be tested needs to be specified as an R object. In lavaan the =~ operator is the exact same as the BY operator in Mplus. In the statements below the variables on the right of the =~ are indicators of the variable to the left of the =~. Also notice that there are + signs that separate the variables on the right side of the equation. The entire model statement needs to be wrapped in quotation marks.

CFAmodel <- 'MATH =~ wratcalc + wjcalc + waiscalc
             SPELL =~ wratspl + wjspl + waisspl'

Once the model is built, it can be fit to the data using the cfa function. The model object is the CFAmodel object that was specified above. The data object is the dat object that was imported and altered above. The last command is used to set the scale for the latent variable. The default value for std.lv is FALSE, by setting it to TRUE the first factor loading for each latent variable is freely estimated, the mean for each latent variable is set to 0, and the variance for each latent variable is set to 1.

CFAmodel.fitted <- cfa(model = CFAmodel, data = dat, std.lv = TRUE,
                       missing = "fiml", mimic = "Mplus")
summary(CFAmodel.fitted)

lavaan 0.6-3 ended normally after 50 iterations

  Optimization method                           NLMINB
  Number of free parameters                         19

  Number of observations                           322
  Number of missing patterns                         4

  Estimator                                         ML
  Model Fit Test Statistic                       9.540
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.299

Parameter Estimates:

  Information                                 Observed
  Observed information based on                Hessian
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  MATH =~                                             
    wratcalc          6.041    0.276   21.921    0.000
    wjcalc            4.144    0.203   20.370    0.000
    waiscalc          2.410    0.165   14.636    0.000
  SPELL =~                                            
    wratspl           6.532    0.288   22.645    0.000
    wjspl             6.809    0.296   23.025    0.000
    waisspl           6.354    0.283   22.463    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  MATH ~~                                             
    SPELL             0.553    0.042   13.191    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .wratcalc         38.922    0.355  109.514    0.000
   .wjcalc           23.812    0.255   93.297    0.000
   .waiscalc         11.022    0.186   59.230    0.000
   .wratspl          36.484    0.385   94.751    0.000
   .wjspl            41.674    0.398  104.808    0.000
   .waisspl          37.163    0.376   98.788    0.000
    MATH              0.000                           
    SPELL             0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .wratcalc          4.179    1.014    4.122    0.000
   .wjcalc            3.769    0.537    7.017    0.000
   .waiscalc          5.304    0.457   11.596    0.000
   .wratspl           5.053    0.612    8.256    0.000
   .wjspl             4.541    0.616    7.369    0.000
   .waisspl           5.156    0.599    8.605    0.000
    MATH              1.000                           
    SPELL             1.000

Examining Output

We can request standardized output columns by adding standardized = TRUE. The Std.lv column is the parameter estimates when just latent variables are standardized (they already were standardized, so this column tells us nothing new), whereas the Std.all column is the parameter estimates when all variables are standardized.

summary(CFAmodel.fitted, standardized = TRUE)

lavaan 0.6-3 ended normally after 50 iterations

  Optimization method                           NLMINB
  Number of free parameters                         19

  Number of observations                           322
  Number of missing patterns                         4

  Estimator                                         ML
  Model Fit Test Statistic                       9.540
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.299

Parameter Estimates:

  Information                                 Observed
  Observed information based on                Hessian
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  MATH =~                                                               
    wratcalc          6.041    0.276   21.921    0.000    6.041    0.947
    wjcalc            4.144    0.203   20.370    0.000    4.144    0.906
    waiscalc          2.410    0.165   14.636    0.000    2.410    0.723
  SPELL =~                                                              
    wratspl           6.532    0.288   22.645    0.000    6.532    0.946
    wjspl             6.809    0.296   23.025    0.000    6.809    0.954
    waisspl           6.354    0.283   22.463    0.000    6.354    0.942

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  MATH ~~                                                               
    SPELL             0.553    0.042   13.191    0.000    0.553    0.553

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .wratcalc         38.922    0.355  109.514    0.000   38.922    6.103
   .wjcalc           23.812    0.255   93.297    0.000   23.812    5.203
   .waiscalc         11.022    0.186   59.230    0.000   11.022    3.306
   .wratspl          36.484    0.385   94.751    0.000   36.484    5.282
   .wjspl            41.674    0.398  104.808    0.000   41.674    5.841
   .waisspl          37.163    0.376   98.788    0.000   37.163    5.508
    MATH              0.000                               0.000    0.000
    SPELL             0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .wratcalc          4.179    1.014    4.122    0.000    4.179    0.103
   .wjcalc            3.769    0.537    7.017    0.000    3.769    0.180
   .waiscalc          5.304    0.457   11.596    0.000    5.304    0.477
   .wratspl           5.053    0.612    8.256    0.000    5.053    0.106
   .wjspl             4.541    0.616    7.369    0.000    4.541    0.089
   .waisspl           5.156    0.599    8.605    0.000    5.156    0.113
    MATH              1.000                               1.000    1.000
    SPELL             1.000                               1.000    1.000

We can request the fit measures along with the parameter estimates. This makes the output resemble what is returned by Mplus.

summary(CFAmodel.fitted, standardized = TRUE, fit = TRUE)

lavaan 0.6-3 ended normally after 50 iterations

  Optimization method                           NLMINB
  Number of free parameters                         19

  Number of observations                           322
  Number of missing patterns                         4

  Estimator                                         ML
  Model Fit Test Statistic                       9.540
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.299

Model test baseline model:

  Minimum Function Test Statistic             1882.335
  Degrees of freedom                                15
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.999
  Tucker-Lewis Index (TLI)                       0.998

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -5127.830
  Loglikelihood unrestricted model (H1)      -5123.060

  Number of free parameters                         19
  Akaike (AIC)                               10293.661
  Bayesian (BIC)                             10365.377
  Sample-size adjusted Bayesian (BIC)        10305.112

Root Mean Square Error of Approximation:

  RMSEA                                          0.024
  90 Percent Confidence Interval          0.000  0.073
  P-value RMSEA <= 0.05                          0.761

Standardized Root Mean Square Residual:

  SRMR                                           0.024

Parameter Estimates:

  Information                                 Observed
  Observed information based on                Hessian
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  MATH =~                                                               
    wratcalc          6.041    0.276   21.921    0.000    6.041    0.947
    wjcalc            4.144    0.203   20.370    0.000    4.144    0.906
    waiscalc          2.410    0.165   14.636    0.000    2.410    0.723
  SPELL =~                                                              
    wratspl           6.532    0.288   22.645    0.000    6.532    0.946
    wjspl             6.809    0.296   23.025    0.000    6.809    0.954
    waisspl           6.354    0.283   22.463    0.000    6.354    0.942

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  MATH ~~                                                               
    SPELL             0.553    0.042   13.191    0.000    0.553    0.553

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .wratcalc         38.922    0.355  109.514    0.000   38.922    6.103
   .wjcalc           23.812    0.255   93.297    0.000   23.812    5.203
   .waiscalc         11.022    0.186   59.230    0.000   11.022    3.306
   .wratspl          36.484    0.385   94.751    0.000   36.484    5.282
   .wjspl            41.674    0.398  104.808    0.000   41.674    5.841
   .waisspl          37.163    0.376   98.788    0.000   37.163    5.508
    MATH              0.000                               0.000    0.000
    SPELL             0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .wratcalc          4.179    1.014    4.122    0.000    4.179    0.103
   .wjcalc            3.769    0.537    7.017    0.000    3.769    0.180
   .waiscalc          5.304    0.457   11.596    0.000    5.304    0.477
   .wratspl           5.053    0.612    8.256    0.000    5.053    0.106
   .wjspl             4.541    0.616    7.369    0.000    4.541    0.089
   .waisspl           5.156    0.599    8.605    0.000    5.156    0.113
    MATH              1.000                               1.000    1.000
    SPELL             1.000                               1.000    1.000

What if we only want certain fit measures? Those can be assigned and viewed using the commands below.

fitStats <- fitMeasures(CFAmodel.fitted, c("cfi", "rmsea"))
fitStats

  cfi rmsea 
0.999 0.024

We can pull out the parameter estimates instead of just looking at the summary. This can be useful when putting the parameter estimates in a table.

paramEsts <- parameterEstimates(CFAmodel.fitted)
paramEsts

        lhs op      rhs    est    se       z pvalue ci.lower ci.upper
1      MATH =~ wratcalc  6.041 0.276  21.921      0    5.501    6.581
2      MATH =~   wjcalc  4.144 0.203  20.370      0    3.745    4.543
3      MATH =~ waiscalc  2.410 0.165  14.636      0    2.088    2.733
4     SPELL =~  wratspl  6.532 0.288  22.645      0    5.967    7.097
5     SPELL =~    wjspl  6.809 0.296  23.025      0    6.230    7.389
6     SPELL =~  waisspl  6.354 0.283  22.463      0    5.799    6.908
7  wratcalc ~~ wratcalc  4.179 1.014   4.122      0    2.192    6.166
8    wjcalc ~~   wjcalc  3.769 0.537   7.017      0    2.716    4.821
9  waiscalc ~~ waiscalc  5.304 0.457  11.596      0    4.407    6.200
10  wratspl ~~  wratspl  5.053 0.612   8.256      0    3.853    6.253
11    wjspl ~~    wjspl  4.541 0.616   7.369      0    3.333    5.749
12  waisspl ~~  waisspl  5.156 0.599   8.605      0    3.982    6.330
13     MATH ~~     MATH  1.000 0.000      NA     NA    1.000    1.000
14    SPELL ~~    SPELL  1.000 0.000      NA     NA    1.000    1.000
15     MATH ~~    SPELL  0.553 0.042  13.191      0    0.470    0.635
16 wratcalc ~1          38.922 0.355 109.514      0   38.226   39.619
17   wjcalc ~1          23.812 0.255  93.297      0   23.311   24.312
18 waiscalc ~1          11.022 0.186  59.230      0   10.657   11.386
19  wratspl ~1          36.484 0.385  94.751      0   35.730   37.239
20    wjspl ~1          41.674 0.398 104.808      0   40.895   42.453
21  waisspl ~1          37.163 0.376  98.788      0   36.426   37.900
22     MATH ~1           0.000 0.000      NA     NA    0.000    0.000
23    SPELL ~1           0.000 0.000      NA     NA    0.000    0.000

We can also pull out the factor scores by using the lavPredict function on the testOutput object.

fscores <- lavPredict(CFAmodel.fitted)
head(fscores)

             MATH      SPELL
[1,]  1.378216628  1.3727471
[2,]  1.812068411  2.0974157
[3,] -0.006094102  0.1972874
[4,]  1.400963259 -0.2650847
[5,]  1.768911802 -0.7860667
[6,]  0.549146289  0.3652424

The semPaths function from the semPlot package provides an easy way to produce a path diagram.

library(semPlot)
semPaths(CFAmodel.fitted)

We can change some of the settings to make the path diagram into something we like a bit more.

semPaths(CFAmodel.fitted, what = "paths", intercepts = FALSE, sizeMan = 12, sizeMan2 = 8,
         layout = "tree2", sizeLat = 20, sizeLat2 = 10, width = 5, height = 1,
         label.cex = 1, nCharNodes = 0, curve = 2.5, label.scale = FALSE)

We can ask for the parameter estimates to be placed on the paths directly with the whatLabels = "est" argument. We also use the “edge.label.cex = 1.2” argument to make the values larger than the default.

semPaths(CFAmodel.fitted, what = "paths", whatLabels = "est", intercepts = FALSE,
         sizeMan = 12, sizeMan2 = 8,layout = "tree2", sizeLat = 20, sizeLat2 = 10,
         width = 5, height = 3, label.cex = 1, nCharNodes = 0, curve = 2.5,
         label.scale = FALSE, edge.label.cex = 1.2)

Lastly, the kutils function semTable can be used to produce a table summarizing the model parameter estimates in latex or html code. In this example we produce an html table that is automatically built by Rmarkdown.

semTable(CFAmodel.fitted, type = "html")

In R your output will look like this:

<table style="padding-right:20px;padding-left:20px;">
<tr><td></td><td colspan = '4'; align = 'center'>Model</td></tr> 
<tr><td></td><td colspan = '1'; align = 'center'>Estimate</td><td colspan = '1'; align = 'center'>Std. Err.</td><td colspan = '1'; align = 'center'>z</td><td colspan = '1'; align = 'center'>p</td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Factor Loadings</span></td></tr> <tr><td colspan = '1'; align = 'left'><span style="text-decoration: underline;">MATH</span></td></tr>
<tr><td>wratcalc</td><td>6.04</td><td>0.28</td><td>21.92</td><td>.000</td></tr>
<tr><td>wjcalc</td><td>4.14</td><td>0.20</td><td>20.37</td><td>.000</td></tr>
<tr><td>waiscalc</td><td>2.41</td><td>0.16</td><td>14.64</td><td>.000</td></tr>
 <tr><td colspan = '1'; align = 'left'><span style="text-decoration: underline;">SPELL</span></td></tr>
<tr><td>wratspl</td><td>6.53</td><td>0.29</td><td>22.64</td><td>.000</td></tr>
<tr><td>wjspl</td><td>6.81</td><td>0.30</td><td>23.03</td><td>.000</td></tr>
<tr><td>waisspl</td><td>6.35</td><td>0.28</td><td>22.46</td><td>.000</td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Intercepts</span></td></tr>
<tr><td>wratcalc</td><td>38.92</td><td>0.36</td><td>109.51</td><td>.000</td></tr>
<tr><td>wjcalc</td><td>23.81</td><td>0.26</td><td>93.30</td><td>.000</td></tr>
<tr><td>waiscalc</td><td>11.02</td><td>0.19</td><td>59.23</td><td>.000</td></tr>
<tr><td>wratspl</td><td>36.48</td><td>0.39</td><td>94.75</td><td>.000</td></tr>
<tr><td>wjspl</td><td>41.67</td><td>0.40</td><td>104.81</td><td>.000</td></tr>
<tr><td>waisspl</td><td>37.16</td><td>0.38</td><td>98.79</td><td>.000</td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Residual Variances</span></td></tr>
<tr><td>wratcalc</td><td>4.18</td><td>1.01</td><td>4.12</td><td>.000</td></tr>
<tr><td>wjcalc</td><td>3.77</td><td>0.54</td><td>7.02</td><td>.000</td></tr>
<tr><td>waiscalc</td><td>5.30</td><td>0.46</td><td>11.60</td><td>.000</td></tr>
<tr><td>wratspl</td><td>5.05</td><td>0.61</td><td>8.26</td><td>.000</td></tr>
<tr><td>wjspl</td><td>4.54</td><td>0.62</td><td>7.37</td><td>.000</td></tr>
<tr><td>waisspl</td><td>5.16</td><td>0.60</td><td>8.60</td><td>.000</td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Latent Intercepts</span></td></tr>
<tr><td>MATH</td><td>0.00<sup>+</sup></td><td></td><td></td><td></td></tr>
<tr><td>SPELL</td><td>0.00<sup>+</sup></td><td></td><td></td><td></td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Latent Variances</span></td></tr>
<tr><td>MATH</td><td>1.00<sup>+</sup></td><td></td><td></td><td></td></tr>
<tr><td>SPELL</td><td>1.00<sup>+</sup></td><td></td><td></td><td></td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Latent Covariances</span></td></tr>
<tr><td>MATH w/SPELL</td><td>0.55</td><td>0.04</td><td>13.19</td><td>.000</td></tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Fit Indices</span></td></tr>
<tr><td>&chi;<sup>2</sup></td><td>9.54(8)</td><td></td><td></td><td>.299</td></tr>
<tr><td>CFI</td><td>1.00</td><td></td><td></td><td></td></tr>
<tr><td>TLI</td><td>1.00</td><td></td><td></td><td></td></tr>
<tr><td>RMSEA</td><td>0.02</td><td></td><td></td><td></td></tr>
<tr><td colspan = '5'; align = 'left'><sup>+</sup>Fixed parameter</td></tr>
</table><br>

If the file argument is used to specify a file to be saved, opening the resulting html file in a web browser will show a formatted table.

semTable(CFAmodel.fitted, type = "html", file = "CFAmodel.fitted.html")

	Model
	Estimate	Std. Err.	z	p
	Factor Loadings
MATH
wratcalc	6.04	0.28	21.92	.000
wjcalc	4.14	0.20	20.37	.000
waiscalc	2.41	0.16	14.64	.000
SPELL
wratspl	6.53	0.29	22.64	.000
wjspl	6.81	0.30	23.03	.000
waisspl	6.35	0.28	22.46	.000
	Intercepts
wratcalc	38.92	0.36	109.51	.000
wjcalc	23.81	0.26	93.30	.000
waiscalc	11.02	0.19	59.23	.000
wratspl	36.48	0.39	94.75	.000
wjspl	41.67	0.40	104.81	.000
waisspl	37.16	0.38	98.79	.000
	Residual Variances
wratcalc	4.18	1.01	4.12	.000
wjcalc	3.77	0.54	7.02	.000
waiscalc	5.30	0.46	11.60	.000
wratspl	5.05	0.61	8.26	.000
wjspl	4.54	0.62	7.37	.000
waisspl	5.16	0.60	8.60	.000
	Latent Intercepts
MATH	0.00⁺
SPELL	0.00⁺
	Latent Variances
MATH	1.00⁺
SPELL	1.00⁺
	Latent Covariances
MATH w/SPELL	0.55	0.04	13.19	.000
	Fit Indices
χ²	9.54(8)			.299
CFI	1.00
TLI	1.00
RMSEA	0.02
⁺Fixed parameter

The CFA Example in Mplus

Please click cfa-01.html if the reader would like to see the same CFA model fitted with Mplus.

Session Info

R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] semPlot_1.1         kutils_1.62         lavaan_0.6-3       
[4] stationery_0.98.5.7

loaded via a namespace (and not attached):
 [1] nlme_3.1-137        RColorBrewer_1.1-2  mi_1.0             
 [4] tools_3.5.1         backports_1.1.3     R6_2.3.0           
 [7] rpart_4.1-13        d3Network_0.5.2.1   Hmisc_4.2-0        
[10] lazyeval_0.2.1      colorspace_1.4-0    nnet_7.3-12        
[13] tidyselect_0.2.5    gridExtra_2.3       mnormt_1.5-5       
[16] compiler_3.5.1      qgraph_1.6          fdrtool_1.2.15     
[19] htmlTable_1.13.1    scales_1.0.0        checkmate_1.9.1    
[22] psych_1.8.12        pbapply_1.4-0       sem_3.1-9          
[25] stringr_1.3.1       digest_0.6.18       pbivnorm_0.6.0     
[28] foreign_0.8-71      minqa_1.2.4         rmarkdown_1.11     
[31] base64enc_0.1-3     jpeg_0.1-8          pkgconfig_2.0.2    
[34] htmltools_0.3.6     lme4_1.1-20         lisrelToR_0.1.4    
[37] htmlwidgets_1.3     rlang_0.3.1         rstudioapi_0.9.0   
[40] huge_1.2.7          bindr_0.1.1         gtools_3.8.1       
[43] acepack_1.4.1       dplyr_0.7.8         zip_1.0.0          
[46] magrittr_1.5        OpenMx_2.12.1       Formula_1.2-3      
[49] Matrix_1.2-15       Rcpp_1.0.0          munsell_0.5.0      
[52] abind_1.4-5         rockchalk_1.8.130   stringi_1.2.4      
[55] whisker_0.3-2       yaml_2.2.0          carData_3.0-2      
[58] MASS_7.3-51.1       plyr_1.8.4          matrixcalc_1.0-3   
[61] grid_3.5.1          parallel_3.5.1      crayon_1.3.4       
[64] lattice_0.20-38     splines_3.5.1       knitr_1.21         
[67] pillar_1.3.1        igraph_1.2.2        boot_1.3-20        
[70] rjson_0.2.20        corpcor_1.6.9       BDgraph_2.54       
[73] reshape2_1.4.3      stats4_3.5.1        XML_3.98-1.16      
[76] glue_1.3.0          evaluate_0.12       latticeExtra_0.6-28
[79] data.table_1.12.0   png_0.1-7           nloptr_1.2.1       
[82] gtable_0.2.0        purrr_0.3.0         assertthat_0.2.0   
[85] ggplot2_3.1.0       xfun_0.4            openxlsx_4.1.0     
[88] xtable_1.8-3        semTools_0.5-1      coda_0.19-2        
[91] survival_2.43-3     glasso_1.10         tibble_2.0.1       
[94] arm_1.10-1          ggm_2.3             bindrcpp_0.2.2     
[97] cluster_2.0.7-1

Available under Created Commons license 3.0 CC BY