Introduction to CFA
PSY544 – Introduction to Factor Analysis
Week 11

Introduction
•So far, all we covered was exploratory factor analysis (EFA) – or “unrestricted” factor analysis
•
•We’ve covered a huge a chunk of stuff and you should be proud of yourself! You have all the
knowledge you need to become master exploratory factor analysts.
•
•You’ve also managed to see me twice a week and not jump out of the window.

Introduction
•Today, we begin with the rest of the course, which will cover confirmatory factor analysis (CFA) –
or “restricted” factor analysis.
•
•The difference between EFA and CFA lies in the incorporation of prior hypothesis about the factor
structure into the model specification.
•
•In EFA, the analyst seeks to explore the number and nature of the major common factors. Rotation
to simple structure is usually necessary.
•In CFA, the analyst has a specific prior hypothesis about the number and nature of the major
common factors. This hypothesis is directly incorporated into model specification. No rotation is
involved.

Introduction


Introduction
•It is no longer possible to obtain estimates of the factor loadings once the unique factor
variances (or communalities) are estimated.
•
•All parameters in CFA have to be estimated simultaneously by numerically minimizing some
discrepancy function.
•
•In effect, CFA tends to be slower than EFA, even though the number of estimated parameters is
smaller.

Software
•Plethora of software exists for confirmatory factor analysis or – more generally – for structural
equation modeling (of which FA is a special case)
•
•LISREL, EQS, Mplus, RAMONA, SePATH, Mx, AMOS…
•
•…meh. In this course, we will use R and the lavaan package. For all its quirks and a steep
learning curve, it’s a modern piece of software that allows for great flexibility. Oh, and you
don’t have to sell a kidney to work with it – it’s free.

Exploratory (Unrestricted) Factor Analysis
•As you already know, in EFA, there are typically no (solid) prior ideas about the number of the
common factors or their nature (the position of zero loadings)
•
•Sure, the analyst might have *some* ideas about the variables being analyzed, these don’t need to
be expressed nor they need to be correct.
•If the analyst conducts a blind rotation (like Quartimax) of the estimated factors, they will
never know if the failure to see non-zero loadings where expected is because their hypothesis is
incorrect or whether the rotational criterion is inadequate for the given situation.

Exploratory (Unrestricted) Factor Analysis
•Also, in EFA, the decision when to stop is based heavily on the analyst’s judgement and the entire
thing is largely data-driven rather than theory-driven.
•
•That’s fine, as long as it’s acknowledged as such.

Confirmatory (Restricted) Factor Analysis
•CFA should be used only when there is a solid prior hypothesis about the number and nature of the
common factors.
•It’s totally fine (actually preferable in a lot of cases) to have several competing hypotheses.
•
•The analyst must be able to specify the number and position of zero loadings before the analysis.
After that, the corresponding models are fit to data and the degree of model-data fit is assessed,
which suggests the extent to which the prior hypothesis fits the empirical reality.

Confirmatory (Restricted) Factor Analysis
•CFA is not a data-driven enterprise. It’s theory-driven.
•
•CFA can seduce you to use it in a data-driven way. That’s dangerous, because it can lead to
“confirmatory” models that are merely statistical artifacts.
•
•Confirmatory model is still a model. As such, it is nothing but an approximation. Make sure to be
just as cautious in this regard as you would be while performing EFA.

Restrictions


Restrictions
•Restrictions usually represent aspects of a prior hypothesis and serve to represent that
hypothesis.
•
•They do affect the implied correlation/covariance matrix, hence they do affect the fit of the
model, and so are testable.
•
•Because CFA includes imposition of restrictions, confirmatory models typically result in worse fit
to the data than exploratory models which are free of restrictions.

The CFA model


The CFA model


The CFA model


The CFA model


The CFA model


The CFA model


The CFA model
•How many degrees of freedom does our model have?
•
•Well, first of all, let’s count the number of parameters we are freely estimating. That’s six
factor loadings, six unique variances, and one correlation between factors, a total of 13
parameters to estimate.
•
•Our data is a 6 x 6 correlation / covariance matrix, which has
[6 * (6+1)]/2 = 21 unique elements – the number of degrees of freedom for the null model.
•In our case, the DF number is 21 – 13 = 8 degrees of freedom.
•
•

Path diagrams
•Path diagrams are a standard way to communicate a CFA model
•
•Let’s spend some time on the basics.
•
•

Path diagrams
•Rectangles denote manifest variables
•
•
•
•
•
•Circles denote latent variables
•
Age
Int

Path diagrams
•One-sided, linear arrows denote a regression path
•
•
•
•
•Double-sided, curved arrows denote a correlation / covariance
(pretend like the line is curved, OK?)
•
Age
Int
Age
Height

Path diagrams
•A path diagram should contain as much model-related information as possible, ideally all of it
•
•Each arrow stands for a parameter, and so should be labeled with the value of that particular
parameter
•
•

Path diagrams
http://tx.shu.edu.tw/~purplewoo/Literature/!DataAnalysis/confirmatory%20factor%20analysis-intro.fil
es/path1.gif

Path diagrams
https://openi.nlm.nih.gov/imgs/512/347/3063185/PMC3063185_1477-7525-9-12-1.png


Path diagrams


Path diagrams
•
•
•Certain kinds of software will allow you to specify model using only path diagrams (LISREL, for
instance), while for some software, path diagrams are the only way to specify a model (AMOS, I
think) – however, even in AMOS, the software will “translate” the information contained in the path
diagram into the model matrices.
•
•
•
•

Estimation
•As previously with EFA, estimation of parameters in CFA follows the basic principles of minimum
discrepancy estimation.
•
•We are looking for a vector of parameters for which the following is true: the model-implied
covariance / correlation matrix has minimum “distance” from the observed covariance / correlation
matrix
(in other words, the discrepancy function value is at a minimum)
•
•
•
•

Estimation
•Conceptually speaking:
•
•Ordinary least squares (OLS) – simple summed differences between the observed and the
model-implied matrices
•
•Generalized least squares (GLS) – the differences between the observed and the model-implied
matrices are weighted by corresponding elements in the observed matrix (discrepancy in a larger
element is penalized less than discrepancy in a smaller element)
•
•
•
•

Estimation
•
•
•
•Of course, the constrained parameters are not being estimated.
•
•
•
•