Ben Kite, Center for Research Methods and Data Analysis, University of Kansas <bakite@ku.edu>
Please visit http://crmda.ku.edu/guides
Keywords: Structural Equation Modeling, Exploratory Factor Analysis, R, factanal
Abstract
This guide outlines how to specify an exploratory factor analysis in R. An example with 6 manifest variables measuring 1 or 2 latent factors is presented. The model estimation results can be compared to the same model fitted with Mplus.
The datafile “job_placement” needs to be read in to the R session.
dat <- read.csv("../../data/job_placement.csv", header = FALSE)
Because the datafile does not have column (or variable) names, the variable names need to be specified.
colnames(dat) <- c("id", "wjcalc", "wjspl", "wratspl", "wratcalc", "waiscalc", "waisspl", "edlevel", "newschl", "suspend", "expelled", "haveld", "female", "age")
In the original data file the missing values are coded as “99999”. These values need to be recoded to NA so that R recognizes them as missing.
dat[dat == 99999] <- NA
Then the variables that are to be used in the EFA need to be put into a separate data frame. This is the data frame that will be used in the analysis. The dat[ , 2:7]
command makes a data frame using all rows, but only columns 2-7 from the “dat” data frame.
dat1 <- dat[ , 2:7]
The last part of the data manipulation is to remove the cases with missing values in the analysis data frame, this is something equivalent to LISTWISE = ON
under the DATA
command in Mplus.
dat1 <- na.omit(dat1)
Now the EFA can be run with 1 and 2 factors extracted.
output1 <- factanal(dat1, 1, rotation = "varimax")
output1
Call:
factanal(x = dat1, factors = 1, rotation = "varimax")
Uniquenesses:
wjcalc wjspl wratspl wratcalc waiscalc waisspl
0.728 0.093 0.108 0.695 0.749 0.116
Loadings:
Factor1
wjcalc 0.522
wjspl 0.953
wratspl 0.945
wratcalc 0.552
waiscalc 0.501
waisspl 0.940
Factor1
SS loadings 3.511
Proportion Var 0.585
Test of the hypothesis that 1 factor is sufficient.
The chi square statistic is 461.38 on 9 degrees of freedom.
The p-value is 1.06e-93
output2 <- factanal(dat1, 2, rotation = "varimax")
output2
Call:
factanal(x = dat1, factors = 2, rotation = "varimax")
Uniquenesses:
wjcalc wjspl wratspl wratcalc waiscalc waisspl
0.184 0.089 0.107 0.096 0.477 0.112
Loadings:
Factor1 Factor2
wjcalc 0.230 0.873
wjspl 0.907 0.298
wratspl 0.894 0.306
wratcalc 0.248 0.918
waiscalc 0.281 0.667
waisspl 0.896 0.293
Factor1 Factor2
SS loadings 2.617 2.318
Proportion Var 0.436 0.386
Cumulative Var 0.436 0.823
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 3.8 on 4 degrees of freedom.
The p-value is 0.434
Please click efa-01.html if the reader would like to see the same EFA models fitted with Mplus.
Below is a summary of the R session used to generate this example.
R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] stationery_0.98.5.7
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 digest_0.6.18 plyr_1.8.4 xtable_1.8-3
[5] magrittr_1.5 stats4_3.5.1 evaluate_0.12 zip_1.0.0
[9] stringi_1.2.4 pbivnorm_0.6.0 openxlsx_4.1.0 rmarkdown_1.11
[13] tools_3.5.1 stringr_1.3.1 foreign_0.8-71 kutils_1.62
[17] yaml_2.2.0 xfun_0.4 compiler_3.5.1 mnormt_1.5-5
[21] htmltools_0.3.6 knitr_1.21 lavaan_0.6-3
Available under Created Commons license 3.0