Deciding on the number of factors
PSY544 – Introduction to Factor Analysis
Week 9

Introduction
•The objective of exploratory (unrestricted) factor analysis is to determine the number and nature
of the common factors.
•
•Deciding on the number of factors to extract is key, also because subsequent analyses (such as
rotation) depend on this decision.
•
•There’s a lot of literature out there on the how-many-factors problem. Unfortunately, most of this
literature focuses on developing mechanistical rules that will indicate the “true” number of
factors.

Introduction
•This is very misguided.
•
•First of all, there is no such thing as “the true number of factors”.
•
•Second, mechanistical rules – while sometimes useful – do not work every time.

Introduction
•We need to recognize that no parsimonious model will hold exactly.
•
•We must use informed judgement and need to take into account various sources of information.
•
•At times, the decision won’t be easy and all the available information will not clearly point us
in either direction.

Introduction
•A lot of researchers use the mechanistical rules or rules of thumb without careful judgement or
understanding.
•
•This is not effective and can result in misleading solutions or interpretations.
•
•The mechanical rules can (and often do) provide useful information, but none can be effectively
used by itself, without considering other information.

Introduction
•There is a trade-off in what we do – we would like to find such a number of factors so that the
model would fit reasonably well, but still provides a considerable degree of parsimony.
•
•Ideal situation: to find a model that fits reasonably well, while a model with fewer factors fits
significantly worse and a model with more factors doesn’t fit significantly better.
•
•Also, the identified factors have to be meaningful and interpretable.

Pitt & Myung (2002)
https://ars.els-cdn.com/content/image/1-s2.0-S1364661302019642-gr1.jpg


Introduction
•Nowadays, simply doing an EFA is not enough (oh, the old days…). So it is also important to view
EFA as a step in the entire factor analysis process – it is very likely that you will subsequently
perform a CFA on similar data, so EFA can work as a tool for limiting the space of potential models
and identifying a couple of candidate models (with a reasonable number and interpretability of the
common factors)

EFA, CFA, UFA, RFA…
•I have been using the terms exploratory, confirmatory, restricted, unrestricted, …, in a pretty
chaotic way sometimes, so let me explain.
•
•Traditionally, people recognize EFA (exploratory) and CFA (confirmatory). But this sounds a bit
more like it’s about the purpose and not about the model itself.
•
•Better to use restricted (typically CFA) and unrestricted (typically EFA) factor analysis. This
simply refers to whether there are any restrictions on Λ

Sources of error
•
•When performing EFA (or most statistical analyses, actually), we face two sources of error:
1.Sampling error (our R is not our P)
2.Model error (even if we had P, it would not fit perfectly – all models        are wrong)

Number of factors
•OK, so, we established there is not such thing as the true number of factors
•
•Our objective, then, is to identify the plausible number of major common factors
•
•What kind of information is available?

Number of factors
•Types of information available:
1.
1.Rules of thumb (mostly based on eigenvalues)
2.Statistical tests
3.Common sense
4.Informed judgement

Rules of thumb
•Number of eigenvalues greater than 1
•
•Sometimes known as the Kaiser criterion or the Kaiser-Guttman criterion.
•
•According to Guttman (1954), if the model holds exactly in the population, the number of
eigenvalues greater than 1 provides a lower bound to the number of factors.

Rules of thumb
•Number of eigenvalues greater than 1
•
•Although there is theoretical justification for this rule at the population level and in the ideal
case the model holds exactly, it is routinely used with sample correlation matrices.
•
•People use this VERY often. It has been repeatedly demonstrated to be misleading.
•

Rules of thumb
•Number of eigenvalues greater than 1
•
•Furthermore, the theoretical justification only applies to the population correlation matrix, but
most EFA software gives you the eigenvalues for the sample reduced correlation matrix.
•
•It can serve you as a guide or a reference point, but dogmatic application will (probably) lead
you nowhere.
•

Rules of thumb
•Scree plot
•
•Cattell (1966). Simply plot the eigenvalues of the sample correlation or reduced sample
correlation matrix and look at the last large discontinuity.
•
•Let’s look at an example.
•
•

Rules of thumb
https://www.ibm.com/support/knowledgecenter/en/SSLVMB_sub/spss/images/images_m-r/out_fac_scree_telc
o_01.jpg

Rules of thumb
•Scree plot
•
•In this case, we would choose m = 3.
•
•This procedure is very subjective and does not have any theoretical justification. Only an
informal rationale is available – if there exists m factors, then there will be m relatively large
eigenvalues. The rest of the eigenvalues (small) will only represent noise.
•
•

Rules of thumb
•Scree plot
•
•As before, this method can be informative to some extent, but should not be used exclusively.
•
•

Goodness of fit tests
•Test of perfect fit
•
•Using the likelihood-ratio test statistic to test a hypothesis that the model fits perfectly in
the population.
•We’ve covered this one already.
•For a model with m factors, rejecting the null hypothesis means we need more factors. When we fail
to reject the null, we should stop adding more factors.
•
•

Goodness of fit tests
•Test of perfect fit
•
•Problems!
•The null hypothesis is not true, we know that, so what’s the point in testing it?
•Sequential tests are not independent.
•Functionally a test of sample size.
•Mechanistical.
•
•

Fit indices
•RMSEA
•
•Selecting the number of factors (or selecting a couple of plausible numbers of factors) is
fundamentally a problem of model fit.
•We can sequentially estimate models with increasing numbers of factors and look at changes in
model fit.
•Encourages mechanistical use. Don’t rely too much on the suggested values for RMSEA.
•
•

Fit indices
•Tucker-Lewis Index (TLI)
•The TLI is a so-called incremental fit index. What does that mean?
•The index is based on comparing the fit of a model with m factors to the fit of two reference
models.
•The first one is the „world’s worst model”, the so-called “null model”, with m = 0. This model
implies zero correlation between the MVs in the population
•The second one is the “world’s best model”, the so-called “ideal model”, which fits perfectly in
the population.
•
•

Fit indices


Fit indices


Fit indices


Fit indices


Fit indices


Selecting the number of factors
1.Consider prior hypotheses about the number of factors
2.There is no need for a single “right answer” at the exploratory phase of an analysis
3.Consider some rules-of-thumb as a source of guidance, but don’t rely on them blindly (scree plot,
eigenvalues…)
4.Consider the results of goodness of fit tests and measures of fit, including RMSEA and TLI
5.Based on what you have observed, decide about the optimal number of factors, or a couple of
alternatives
6.
•
•

Selecting the number of factors
6.Go the extra mile and extract one or more additional factors, and see what happens – it’s better
to “over-factor” than to “under-factor”
7.Look for converging evidence
8.Use your judgment and knowledge about the data. If someone says you are not allowed to, they are
wrong.
•
•

What not to do
•
•
•Use one piece of evidence or one mechanistical approach by itself
•Not consider multiple sources of information
•