Confirmatory Factor Analysis (CFA) Example in R

If you have not already installed the lavaan package, you will need to do that first. Here the command to install the package is commented out (preceded by “##”) because the package is already installed. The lavaan package needs to be loaded by using the “library(lavaan)” command.

##install.packages("lavaan")
library(lavaan)
This is lavaan 0.5-23.1097
lavaan is BETA software! Please report any bugs.
##install.packages("kutils")
library(kutils)

The datafile “job_placement” needs to be read in to the R session.

dat <- read.csv("../../data/job_placement.csv", header = FALSE)

Because the datafile does not have column (or variable) names, the variable names need to be specified.

colnames(dat) <- c("id", "wjcalc", "wjspl", "wratspl", "wratcalc", "waiscalc", "waisspl", "edlevel", "newschl", "suspend", "expelled", "haveld", "female", "age")

In the original data file the missing values are coded as “99999”. These values need to be recoded to NA so that R recognizes them as missing.

dat[dat == 99999] <- NA

Now the CFA model that is to be tested needs to be specified as an R object. In lavaan the “=~” operator is the exact same as the BY operator in Mplus. In the statements below the variables on the right of the “=~” are indicators of the variable to the left of the “=~”. Also notice that there are “+” signs that separate the variables on the right side of the equation. The entire model statement needs to be wrapped in quotation marks.

CFAmodel <- "MATH =~ wratcalc + wjcalc + waiscalc
             SPELL =~ wratspl + wjspl + waisspl"

Once the model is built, it can be fit to the data using the “cfa” function. The model object is the CFAmodel object that was specified above. The data object is the dat object that was imported and altered above. The last command is used to set the scale for the latent variable. The default value for std.lv is “FALSE”, by setting it to TRUE the first factor loading for each latent variable is freely estimated, the mean for each latent variable is set to 0, and the variance for each latent variable is set to 1.

testOutput <- cfa(model = CFAmodel, data = dat, std.lv = TRUE, missing = "fiml")
summary(testOutput)
lavaan (0.5-23.1097) converged normally after  73 iterations

  Number of observations                           322

  Number of missing patterns                         4

  Estimator                                         ML
  Minimum Function Test Statistic                9.540
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.299

Parameter Estimates:

  Information                                 Observed
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  MATH =~                                             
    wratcalc          6.041    0.276   21.921    0.000
    wjcalc            4.144    0.203   20.370    0.000
    waiscalc          2.410    0.165   14.636    0.000
  SPELL =~                                            
    wratspl           6.532    0.288   22.645    0.000
    wjspl             6.809    0.296   23.025    0.000
    waisspl           6.354    0.283   22.463    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  MATH ~~                                             
    SPELL             0.553    0.042   13.191    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .wratcalc         38.922    0.355  109.514    0.000
   .wjcalc           23.812    0.255   93.297    0.000
   .waiscalc         11.022    0.186   59.230    0.000
   .wratspl          36.484    0.385   94.751    0.000
   .wjspl            41.674    0.398  104.808    0.000
   .waisspl          37.163    0.376   98.788    0.000
    MATH              0.000                           
    SPELL             0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .wratcalc          4.179    1.014    4.122    0.000
   .wjcalc            3.769    0.537    7.017    0.000
   .waiscalc          5.304    0.457   11.596    0.000
   .wratspl           5.053    0.612    8.256    0.000
   .wjspl             4.541    0.616    7.369    0.000
   .waisspl           5.156    0.599    8.605    0.000
    MATH              1.000                           
    SPELL             1.000                           

We can request standardized output columns by adding “standardized = TRUE”. The Std.lv column is the parameter estimates when just latent variables are standardized (they already were standardized, so this column tells us nothing new), whereas the Std.all column is the parameter estimates when all variables are standardized.

summary(testOutput, standardized = TRUE)
lavaan (0.5-23.1097) converged normally after  73 iterations

  Number of observations                           322

  Number of missing patterns                         4

  Estimator                                         ML
  Minimum Function Test Statistic                9.540
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.299

Parameter Estimates:

  Information                                 Observed
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  MATH =~                                                               
    wratcalc          6.041    0.276   21.921    0.000    6.041    0.947
    wjcalc            4.144    0.203   20.370    0.000    4.144    0.906
    waiscalc          2.410    0.165   14.636    0.000    2.410    0.723
  SPELL =~                                                              
    wratspl           6.532    0.288   22.645    0.000    6.532    0.946
    wjspl             6.809    0.296   23.025    0.000    6.809    0.954
    waisspl           6.354    0.283   22.463    0.000    6.354    0.942

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  MATH ~~                                                               
    SPELL             0.553    0.042   13.191    0.000    0.553    0.553

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .wratcalc         38.922    0.355  109.514    0.000   38.922    6.103
   .wjcalc           23.812    0.255   93.297    0.000   23.812    5.203
   .waiscalc         11.022    0.186   59.230    0.000   11.022    3.306
   .wratspl          36.484    0.385   94.751    0.000   36.484    5.282
   .wjspl            41.674    0.398  104.808    0.000   41.674    5.841
   .waisspl          37.163    0.376   98.788    0.000   37.163    5.508
    MATH              0.000                               0.000    0.000
    SPELL             0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .wratcalc          4.179    1.014    4.122    0.000    4.179    0.103
   .wjcalc            3.769    0.537    7.017    0.000    3.769    0.180
   .waiscalc          5.304    0.457   11.596    0.000    5.304    0.477
   .wratspl           5.053    0.612    8.256    0.000    5.053    0.106
   .wjspl             4.541    0.616    7.369    0.000    4.541    0.089
   .waisspl           5.156    0.599    8.605    0.000    5.156    0.113
    MATH              1.000                               1.000    1.000
    SPELL             1.000                               1.000    1.000

We can request the fit measures along with the parameter estimates. This makes the output resemble what is returned by Mplus.

summary(testOutput, fit = TRUE)
lavaan (0.5-23.1097) converged normally after  73 iterations

  Number of observations                           322

  Number of missing patterns                         4

  Estimator                                         ML
  Minimum Function Test Statistic                9.540
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.299

Model test baseline model:

  Minimum Function Test Statistic             1882.335
  Degrees of freedom                                15
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.999
  Tucker-Lewis Index (TLI)                       0.998

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -5127.830
  Loglikelihood unrestricted model (H1)      -5123.060

  Number of free parameters                         19
  Akaike (AIC)                               10293.661
  Bayesian (BIC)                             10365.377
  Sample-size adjusted Bayesian (BIC)        10305.112

Root Mean Square Error of Approximation:

  RMSEA                                          0.024
  90 Percent Confidence Interval          0.000  0.073
  P-value RMSEA <= 0.05                          0.761

Standardized Root Mean Square Residual:

  SRMR                                           0.024

Parameter Estimates:

  Information                                 Observed
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  MATH =~                                             
    wratcalc          6.041    0.276   21.921    0.000
    wjcalc            4.144    0.203   20.370    0.000
    waiscalc          2.410    0.165   14.636    0.000
  SPELL =~                                            
    wratspl           6.532    0.288   22.645    0.000
    wjspl             6.809    0.296   23.025    0.000
    waisspl           6.354    0.283   22.463    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  MATH ~~                                             
    SPELL             0.553    0.042   13.191    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .wratcalc         38.922    0.355  109.514    0.000
   .wjcalc           23.812    0.255   93.297    0.000
   .waiscalc         11.022    0.186   59.230    0.000
   .wratspl          36.484    0.385   94.751    0.000
   .wjspl            41.674    0.398  104.808    0.000
   .waisspl          37.163    0.376   98.788    0.000
    MATH              0.000                           
    SPELL             0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .wratcalc          4.179    1.014    4.122    0.000
   .wjcalc            3.769    0.537    7.017    0.000
   .waiscalc          5.304    0.457   11.596    0.000
   .wratspl           5.053    0.612    8.256    0.000
   .wjspl             4.541    0.616    7.369    0.000
   .waisspl           5.156    0.599    8.605    0.000
    MATH              1.000                           
    SPELL             1.000                           

What if we only want certain fit measures? Those can be assigned and viewed using the commands below.

fitStats <- fitMeasures(testOutput, c("cfi", "rmsea"))
fitStats
  cfi rmsea 
0.999 0.024 

We can pull out the parameter estimates instead of just looking at the summary. This can be useful when putting the parameter estimates in a table.

paramEsts <- parameterEstimates(testOutput)
paramEsts
        lhs op      rhs    est    se       z pvalue ci.lower ci.upper
1      MATH =~ wratcalc  6.041 0.276  21.921      0    5.501    6.581
2      MATH =~   wjcalc  4.144 0.203  20.370      0    3.745    4.543
3      MATH =~ waiscalc  2.410 0.165  14.636      0    2.088    2.733
4     SPELL =~  wratspl  6.532 0.288  22.645      0    5.967    7.097
5     SPELL =~    wjspl  6.809 0.296  23.025      0    6.230    7.389
6     SPELL =~  waisspl  6.354 0.283  22.463      0    5.799    6.908
7  wratcalc ~~ wratcalc  4.179 1.014   4.122      0    2.192    6.166
8    wjcalc ~~   wjcalc  3.769 0.537   7.017      0    2.716    4.821
9  waiscalc ~~ waiscalc  5.304 0.457  11.596      0    4.407    6.200
10  wratspl ~~  wratspl  5.053 0.612   8.256      0    3.853    6.253
11    wjspl ~~    wjspl  4.541 0.616   7.369      0    3.333    5.749
12  waisspl ~~  waisspl  5.156 0.599   8.605      0    3.982    6.330
13     MATH ~~     MATH  1.000 0.000      NA     NA    1.000    1.000
14    SPELL ~~    SPELL  1.000 0.000      NA     NA    1.000    1.000
15     MATH ~~    SPELL  0.553 0.042  13.191      0    0.470    0.635
16 wratcalc ~1          38.922 0.355 109.514      0   38.226   39.619
17   wjcalc ~1          23.812 0.255  93.297      0   23.311   24.312
18 waiscalc ~1          11.022 0.186  59.230      0   10.657   11.386
19  wratspl ~1          36.484 0.385  94.751      0   35.730   37.239
20    wjspl ~1          41.674 0.398 104.808      0   40.895   42.453
21  waisspl ~1          37.163 0.376  98.788      0   36.426   37.900
22     MATH ~1           0.000 0.000      NA     NA    0.000    0.000
23    SPELL ~1           0.000 0.000      NA     NA    0.000    0.000

We can also pull out the factor scores by using the predict function on the testOutput object.

fscores <- lavPredict(testOutput)
head(fscores)
             MATH      SPELL
[1,]  1.378216604  1.3727466
[2,]  1.812068035  2.0974152
[3,] -0.006094131  0.1972871
[4,]  1.400963105 -0.2650848
[5,]  1.768911827 -0.7860668
[6,]  0.549146077  0.3652421

The “semPaths” function from the semPlot package provides an easy way to produce a path diagram.

library(semPlot)
semPaths(testOutput)

We can change some of the settings to make the path diagram into something we like a bit more.

semPaths(testOutput, what = "paths", intercepts = FALSE, sizeMan = 12, sizeMan2 = 8,
         layout = "tree2", sizeLat = 20, sizeLat2 = 10, width = 5, height = 1,
         label.cex = 1, nCharNodes = 0, curve = 2.5, label.scale = FALSE)

We can ask for the parameter estimates to be placed on the paths directly with the “whatLabels =”est“” argument. We also use the “edge.label.cex = 1.2” argument to make the values larger than the default.

semPaths(testOutput, what = "paths", whatLabels = "est", intercepts = FALSE, 
         sizeMan = 12, sizeMan2 = 8,layout = "tree2", sizeLat = 20, sizeLat2 = 10, 
         width = 5, height = 3, label.cex = 1, nCharNodes = 0, curve = 2.5, 
         label.scale = FALSE, edge.label.cex = 1.2)

Lastly, the kutils function “semTable” can be used produce a table summarizing the model parameter estimates in latex or html code. In this example we produce an html table that is automatically built by Rmarkdown.

semTable(testOutput, type = "html")

In R your output will look like this:


<table>



<tr><td style="border-bottom: solid thin black; border-collapse:collapse;">&nbsp;Parameter</td> <td style="border-bottom: solid thin black; border-collapse:collapse;">&nbsp;Estimate</td> <td style="border-bottom: solid thin black; border-collapse:collapse;">&nbsp;SE</td> <td style="border-bottom: solid thin black; border-collapse:collapse;">&nbsp;z</td> <td style="border-bottom: solid thin black; border-collapse:collapse;">&nbsp;p</td></tr>

<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Factor Loadings</span></td></tr><tr><td><span style="text-decoration: underline;">MATH</span></td> <td></td> <td></td> <td></td> <td></td> </tr>
<tr><td>wratcalc </td><td> 6.04 </td><td> 0.28 </td><td> 21.92 </td><td> .000</tr>
<tr><td>wjcalc </td><td> 4.14 </td><td> 0.20 </td><td> 20.37 </td><td> .000</tr>
<tr><td>waiscalc </td><td> 2.41 </td><td> 0.16 </td><td> 14.64 </td><td> .000</tr>
<tr><td><span style="text-decoration: underline;">SPELL</span></td> <td></td> <td></td> <td></td> <td></td> </tr>
<tr><td>wratspl </td><td> 6.53 </td><td> 0.29 </td><td> 22.64 </td><td> .000</tr>
<tr><td>wjspl </td><td> 6.81 </td><td> 0.30 </td><td> 23.03 </td><td> .000</tr>
<tr><td>waisspl </td><td> 6.35 </td><td> 0.28 </td><td> 22.46 </td><td> .000</tr>
<tr><td></td> <td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Intercepts</span></td> </tr>
<tr><td>wratcalc </td><td> 38.92 </td><td> 0.36 </td><td> 109.51 </td><td> .000</tr>
<tr><td>wjcalc </td><td> 23.81 </td><td> 0.26 </td><td> 93.30 </td><td> .000</tr>
<tr><td>waiscalc </td><td> 11.02 </td><td> 0.19 </td><td> 59.23 </td><td> .000</tr>
<tr><td>wratspl </td><td> 36.48 </td><td> 0.39 </td><td> 94.75 </td><td> .000</tr>
<tr><td>wjspl </td><td> 41.67 </td><td> 0.40 </td><td> 104.81 </td><td> .000</tr>
<tr><td>waisspl </td><td> 37.16 </td><td> 0.38 </td><td> 98.79 </td><td> .000</tr>
<tr><td></td> <td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Residual Variances</span></td></tr>
<tr><td>wratcalc </td><td> 4.18 </td><td> 1.01 </td><td> 4.12 </td><td> .000</tr>
<tr><td>wjcalc </td><td> 3.77 </td><td> 0.54 </td><td> 7.02 </td><td> .000</tr>
<tr><td>waiscalc </td><td> 5.30 </td><td> 0.46 </td><td> 11.60 </td><td> .000</tr>
<tr><td>wratspl </td><td> 5.05 </td><td> 0.61 </td><td> 8.26 </td><td> .000</tr>
<tr><td>wjspl </td><td> 4.54 </td><td> 0.62 </td><td> 7.37 </td><td> .000</tr>
<tr><td>waisspl </td><td> 5.16 </td><td> 0.60 </td><td> 8.60 </td><td> .000</tr>
<tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Latent Variances/Covariances</span></td></tr>
<tr><td>MATH with MATH </td><td> 1.00* </td><td> 0.00 </td><td>  </td><td> </tr>
<tr><td>SPELL with SPELL </td><td> 1.00* </td><td> 0.00 </td><td>  </td><td> </tr>
<tr><td>MATH with SPELL </td><td> 0.55 </td><td> 0.04 </td><td> 13.19 </td><td> .000</tr>
<tr><td></td> <td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Latent Means</span></td> </tr>
<tr><td>MATH </td><td> 0.00* </td><td> 0.00 </td><td>  </td><td> </tr>
<tr><td>SPELL </td><td> 0.00* </td><td> 0.00 </td><td>  </td><td> </tr>
<tr><td colspan = '5'; align = 'center'; ; style="border-top: solid thin black; border-collapse:collapse;">&nbsp;</tr>
</table>

Note. * Indicates parameters fixed for model identification.<br>&chi;(8) = 9.54, p = .299; CFI = 1.00; TLI = 1.00; RMSEA = 0.02.

If the “file” argument is used to specify a file to be saved, opening the resulting html file in a web browser will show a formatted table.

semTable(testOutput, type = "html", file = "testOutput.html")
 Parameter  Estimate  SE  z  p
Factor Loadings
MATH
wratcalc 6.04 0.28 21.92 .000
wjcalc 4.14 0.20 20.37 .000
waiscalc 2.41 0.16 14.64 .000
SPELL
wratspl 6.53 0.29 22.64 .000
wjspl 6.81 0.30 23.03 .000
waisspl 6.35 0.28 22.46 .000
Intercepts
wratcalc 38.92 0.36 109.51 .000
wjcalc 23.81 0.26 93.30 .000
waiscalc 11.02 0.19 59.23 .000
wratspl 36.48 0.39 94.75 .000
wjspl 41.67 0.40 104.81 .000
waisspl 37.16 0.38 98.79 .000
Residual Variances
wratcalc 4.18 1.01 4.12 .000
wjcalc 3.77 0.54 7.02 .000
waiscalc 5.30 0.46 11.60 .000
wratspl 5.05 0.61 8.26 .000
wjspl 4.54 0.62 7.37 .000
waisspl 5.16 0.60 8.60 .000
Latent Variances/Covariances
MATH with MATH 1.00* 0.00
SPELL with SPELL 1.00* 0.00
MATH with SPELL 0.55 0.04 13.19 .000
Latent Means
MATH 0.00* 0.00
SPELL 0.00* 0.00
 

Note. * Indicates parameters fixed for model identification.
χ(8) = 9.54, p = .299; CFI = 1.00; TLI = 1.00; RMSEA = 0.02.