Multiple Group Confirmatory Factor Analysis (CFA) Example in R

Load the lavaan package at the beginning of the session

library(lavaan)
## This is lavaan 0.5-23.1097
## lavaan is BETA software! Please report any bugs.

The data file is read in, columns are named, and missing values are specified.

dat <- read.csv("../../data/job_placement.csv", header = FALSE)
colnames(dat) <- c("id", "wjcalc", "wjspl", "wratspl", "wratcalc",
                   "waiscalc", "waisspl", "edlevel", "newschl",
                   "suspend", "expelled", "haveld", "female", "age")
dat[dat == 99999] <- NA

Now the CFA model that is to be tested needs to be specified as an R object. In lavaan the “=~” operator is the exact same as the BY operator in Mplus. In the statements below the variables on the right of the “=~” are indicators of the variable to the left of the “=~”. Also notice that there are “+” signs that separate the variables on the right side of the equation. The entire model statement needs to be wrapped in quotation marks.

ConfigModel <- "MATH =~ wratcalc + wjcalc + waiscalc
             SPELL =~ wratspl + wjspl + waisspl"
ConfigOutput <- cfa(model = ConfigModel, data = dat, group = "female",
                    std.lv = TRUE, missing = "fiml")
summary(ConfigOutput)
## lavaan (0.5-23.1097) converged normally after 118 iterations
## 
##   Number of observations per group         
##   1                                                101
##   0                                                221
## 
##   Number of missing patterns per group     
##   1                                                  4
##   0                                                  2
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               19.215
##   Degrees of freedom                                16
##   P-value (Chi-square)                           0.258
## 
## Chi-square for each group:
## 
##   1                                             11.555
##   0                                              7.660
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                             Standard
## 
## 
## Group 1 [1]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH =~                                             
##     wratcalc          6.545    0.496   13.187    0.000
##     wjcalc            4.215    0.366   11.530    0.000
##     waiscalc          2.290    0.276    8.306    0.000
##   SPELL =~                                            
##     wratspl           7.010    0.547   12.817    0.000
##     wjspl             6.833    0.518   13.182    0.000
##     waisspl           6.638    0.520   12.763    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH ~~                                             
##     SPELL             0.634    0.064    9.914    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .wratcalc         38.267    0.666   57.464    0.000
##    .wjcalc           23.603    0.465   50.781    0.000
##    .waiscalc         10.266    0.316   32.528    0.000
##    .wratspl          37.171    0.735   50.590    0.000
##    .wjspl            42.337    0.705   60.073    0.000
##    .waisspl          38.030    0.697   54.579    0.000
##     MATH              0.000                           
##     SPELL             0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .wratcalc          1.957    1.621    1.208    0.227
##    .wjcalc            3.960    0.848    4.671    0.000
##    .waiscalc          4.765    0.714    6.678    0.000
##    .wratspl           5.307    1.105    4.804    0.000
##    .wjspl             3.474    0.910    3.818    0.000
##    .waisspl           4.900    1.007    4.865    0.000
##     MATH              1.000                           
##     SPELL             1.000                           
## 
## 
## Group 2 [0]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH =~                                             
##     wratcalc          5.764    0.330   17.492    0.000
##     wjcalc            4.123    0.244   16.926    0.000
##     waiscalc          2.446    0.200   12.212    0.000
##   SPELL =~                                            
##     wratspl           6.278    0.337   18.646    0.000
##     wjspl             6.788    0.359   18.929    0.000
##     waisspl           6.177    0.335   18.441    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH ~~                                             
##     SPELL             0.520    0.053    9.762    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .wratcalc         39.222    0.417   93.962    0.000
##    .wjcalc           23.910    0.305   78.420    0.000
##    .waiscalc         11.367    0.226   50.320    0.000
##    .wratspl          36.172    0.448   80.762    0.000
##    .wjspl            41.371    0.480   86.157    0.000
##    .waisspl          36.767    0.444   82.899    0.000
##     MATH              0.000                           
##     SPELL             0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .wratcalc          5.280    1.232    4.285    0.000
##    .wjcalc            3.549    0.664    5.342    0.000
##    .waiscalc          5.269    0.557    9.463    0.000
##    .wratspl           4.915    0.726    6.772    0.000
##    .wjspl             4.881    0.795    6.138    0.000
##    .waisspl           5.285    0.736    7.176    0.000
##     MATH              1.000                           
##     SPELL             1.000
metricModel <- "MATH =~ wratcalc + wjcalc + waiscalc
             SPELL =~ wratspl + wjspl + waisspl
             MATH ~~ c(1, NA)*MATH
             SPELL ~~ c(1, NA)*SPELL"
metricOutput <- cfa(model = metricModel, data = dat, std.lv = TRUE,
                    group = "female", group.equal = "loadings",
                    missing = "fiml")
summary(metricOutput)
## lavaan (0.5-23.1097) converged normally after  94 iterations
## 
##   Number of observations per group         
##   1                                                101
##   0                                                221
## 
##   Number of missing patterns per group     
##   1                                                  4
##   0                                                  2
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               25.871
##   Degrees of freedom                                20
##   P-value (Chi-square)                           0.170
## 
## Chi-square for each group:
## 
##   1                                             15.581
##   0                                             10.290
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                             Standard
## 
## 
## Group 1 [1]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH =~                                             
##     wratclc (.p1.)    6.359    0.490   12.972    0.000
##     wjcalc  (.p2.)    4.354    0.342   12.729    0.000
##     waisclc (.p3.)    2.502    0.229   10.909    0.000
##   SPELL =~                                            
##     wratspl (.p4.)    6.751    0.508   13.295    0.000
##     wjspl   (.p5.)    7.020    0.516   13.592    0.000
##     waisspl (.p6.)    6.568    0.493   13.326    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH ~~                                             
##     SPELL             0.641    0.063   10.164    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .wratcalc         38.267    0.656   58.319    0.000
##    .wjcalc           23.602    0.474   49.832    0.000
##    .waiscalc         10.263    0.330   31.085    0.000
##    .wratspl          37.172    0.713   52.136    0.000
##    .wjspl            42.337    0.721   58.719    0.000
##    .waisspl          38.031    0.691   55.052    0.000
##     MATH              0.000                           
##     SPELL             0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     MATH              1.000                           
##     SPELL             1.000                           
##    .wratcalc          3.047    1.413    2.157    0.031
##    .wjcalc            3.603    0.778    4.632    0.000
##    .waiscalc          4.700    0.714    6.586    0.000
##    .wratspl           5.686    1.123    5.061    0.000
##    .wjspl             3.230    0.929    3.478    0.001
##    .waisspl           4.995    1.023    4.883    0.000
## 
## 
## Group 2 [0]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH =~                                             
##     wratclc (.p1.)    6.359    0.490   12.972    0.000
##     wjcalc  (.p2.)    4.354    0.342   12.729    0.000
##     waisclc (.p3.)    2.502    0.229   10.909    0.000
##   SPELL =~                                            
##     wratspl (.p4.)    6.751    0.508   13.295    0.000
##     wjspl   (.p5.)    7.020    0.516   13.592    0.000
##     waisspl (.p6.)    6.568    0.493   13.326    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH ~~                                             
##     SPELL             0.456    0.089    5.103    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .wratcalc         39.222    0.422   92.880    0.000
##    .wjcalc           23.910    0.302   79.192    0.000
##    .waiscalc         11.367    0.221   51.462    0.000
##    .wratspl          36.172    0.454   79.673    0.000
##    .wjspl            41.371    0.472   87.637    0.000
##    .waisspl          36.767    0.446   82.522    0.000
##     MATH              0.000                           
##     SPELL             0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     MATH              0.860    0.156    5.501    0.000
##     SPELL             0.895    0.158    5.676    0.000
##    .wratcalc          4.619    1.205    3.834    0.000
##    .wjcalc            3.835    0.654    5.864    0.000
##    .waiscalc          5.370    0.564    9.525    0.000
##    .wratspl           4.764    0.721    6.607    0.000
##    .wjspl             5.154    0.790    6.526    0.000
##    .waisspl           5.233    0.732    7.153    0.000
scalarModel <- "MATH =~ wratcalc + wjcalc + waiscalc
             SPELL =~ wratspl + wjspl + waisspl
             MATH ~~ c(1, NA)*MATH
             SPELL ~~ c(1, NA)*SPELL
             MATH ~ c(0, NA)*1
             SPELL ~ c(0, NA)*1 "
scalarOutput <- cfa(model = scalarModel, data = dat, std.lv = TRUE,
                    group = "female",
                    group.equal = c("loadings", "intercepts"),
                    missing = "fiml")
summary(scalarOutput)
## lavaan (0.5-23.1097) converged normally after  95 iterations
## 
##   Number of observations per group         
##   1                                                101
##   0                                                221
## 
##   Number of missing patterns per group     
##   1                                                  4
##   0                                                  2
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               36.362
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.051
## 
## Chi-square for each group:
## 
##   1                                             22.546
##   0                                             13.815
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                             Standard
## 
## 
## Group 1 [1]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH =~                                             
##     wratclc (.p1.)    6.381    0.491   12.994    0.000
##     wjcalc  (.p2.)    4.340    0.342   12.699    0.000
##     waisclc (.p3.)    2.529    0.232   10.895    0.000
##   SPELL =~                                            
##     wratspl (.p4.)    6.751    0.508   13.299    0.000
##     wjspl   (.p5.)    7.015    0.516   13.592    0.000
##     waisspl (.p6.)    6.578    0.494   13.329    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH ~~                                             
##     SPELL             0.640    0.063   10.128    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     MATH              0.000                           
##     SPELL             0.000                           
##    .wratclc (.18.)   38.273    0.653   58.628    0.000
##    .wjcalc  (.19.)   23.374    0.454   51.465    0.000
##    .waisclc (.20.)   10.751    0.289   37.239    0.000
##    .wratspl (.21.)   37.211    0.692   53.771    0.000
##    .wjspl   (.22.)   42.411    0.713   59.443    0.000
##    .waisspl (.23.)   37.869    0.675   56.144    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     MATH              1.000                           
##     SPELL             1.000                           
##    .wratcalc          2.841    1.424    1.995    0.046
##    .wjcalc            3.743    0.793    4.723    0.000
##    .waiscalc          4.977    0.766    6.496    0.000
##    .wratspl           5.681    1.123    5.056    0.000
##    .wjspl             3.237    0.930    3.481    0.000
##    .waisspl           5.035    1.032    4.880    0.000
## 
## 
## Group 2 [0]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH =~                                             
##     wratclc (.p1.)    6.381    0.491   12.994    0.000
##     wjcalc  (.p2.)    4.340    0.342   12.699    0.000
##     waisclc (.p3.)    2.529    0.232   10.895    0.000
##   SPELL =~                                            
##     wratspl (.p4.)    6.751    0.508   13.299    0.000
##     wjspl   (.p5.)    7.015    0.516   13.592    0.000
##     waisspl (.p6.)    6.578    0.494   13.329    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MATH ~~                                             
##     SPELL             0.454    0.089    5.104    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     MATH              0.148    0.121    1.220    0.222
##     SPELL            -0.156    0.121   -1.293    0.196
##    .wratclc (.18.)   38.273    0.653   58.628    0.000
##    .wjcalc  (.19.)   23.374    0.454   51.465    0.000
##    .waisclc (.20.)   10.751    0.289   37.239    0.000
##    .wratspl (.21.)   37.211    0.692   53.771    0.000
##    .wjspl   (.22.)   42.411    0.713   59.443    0.000
##    .waisspl (.23.)   37.869    0.675   56.144    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     MATH              0.857    0.156    5.502    0.000
##     SPELL             0.894    0.158    5.676    0.000
##    .wratcalc          4.518    1.204    3.753    0.000
##    .wjcalc            3.911    0.655    5.970    0.000
##    .waiscalc          5.431    0.576    9.435    0.000
##    .wratspl           4.762    0.721    6.603    0.000
##    .wjspl             5.173    0.791    6.541    0.000
##    .waisspl           5.235    0.734    7.134    0.000

Now the models can be compared via chi-square difference testing.

anova(ConfigOutput, metricOutput)
## Chi Square Difference Test
## 
##              Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## ConfigOutput 16 10301 10444 19.215                              
## metricOutput 20 10300 10428 25.871     6.6558       4     0.1552

Here the metric model does not fit significantly worse than the configural model, so metric invariance is supported.

Now we can test for scalar invariance.

anova(metricOutput, scalarOutput)
## Chi Square Difference Test
## 
##              Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)  
## metricOutput 20 10300 10428 25.871                                
## scalarOutput 24 10302 10415 36.361     10.491       4    0.03293 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here the scalar model fits significantly worse, so scalar invariance is not supported.