Conceptual overview
PSY544 – Introduction to Factor Analysis
Week 1

Basic principles
•Consider the type of data we typically apply factor analysis to:
•
•Multivariate data – Data for a sample of individuals on a number of manifest (measured, observed)
variables (like psychological tests)
•
• Data matrix:
One column for each variable
One row for each person

Basic principles
•Entries in the data matrix represent scores of each person on each manifest variable •The
underlying premise of factor analysis is that these data are
not completely random, but have some systematic aspects that can be studied
•
• Data matrix:
One column for each variable
One row for each person

Basic principles
•Factor analysis is one way to study the underlying structure
•i.e., we are trying to get a simpler understanding of data
•
•FA was originally developed to study the structure of human mental abilities. Factor models
provide a formal basis for various theories of intelligence and the structure of mental abilities.
•We will mostly use examples from this domain in the course
•However, FA is used in other domains, like personality, attitudes, etc.
•

Basic principles
•Two aspects of FA:
•
•Theory – statistical models specifying the underlying structure of data
•Methodology – procedures that allow us to analyze these data and reveal the specified structure
•

Key terms and definitions
•Manifest variable – variable that can be directly measured (or observed) •Latent variable –
variable that cannot be directly measured (or observed) – a hypothetical construct. A latent
variable is a factor in factor analysis. Thus, a factor is a variable and individuals have scores
on those factors (hypothetically).
•Population – The entire set of individuals of interest
•Sample – A selected group of individuals from the population (N persons)

Key terms and definitions
•Data matrix:
•
•
•
• X   =          N rows (individuals)
•
•    Score of person i on variable j
•
x11
x12
x1p
xij
xN1
xN2
xNp
p columns (variables)

Key terms and definitions
•What can we observe in these data?
•
•Variation of each variable over individuals
•   (measured by variance / SD)
•
•Covariation in a pair of variables over
•   individuals
•   (measured by covariance / correlation)
•
•
x11
x12
x1p
xij
xN1
xN2
xNp

Key terms and definitions
•Correlation matrix:
•
•
•
• R:
1
r12
r13
r1p
r21
1
r23
r2p
r32
r32
1
r3p
rkj
rjk
rp1
rp2
rp3
1
p manifest variables
p manifest variables

•
•
•
•R:
1
r12
r13
r1p
r21
1
r23
r2p
r32
r32
1
r3p
rkj
rjk
rp1
rp2
rp3
1
•To understand the pattern of relationships among the MVs, we could just try to describe it in
terms of the entries in this matrix
•
•However, this gets increasingly difficult as p increases.

•
•
•
•The objective of factor analysis, then, is to uncover and understand the structure that produces
the correlations in the data
•
•Essential to this objective is the notion of factors
•Factors are latent, unobservable variables – hypothetical constructs
•
•The basic principle of FA is that there exists a small number of factors (within a particular
domain) which influence the MVs and thus produce the correlations (covariances) between manifest
variables.
•
•A correlation between two MVs is due to these two MVs being dependent on one or more of the same
factor(s)

•
•
•
•So, again, what we want is to identify the number and nature of the factors that produce the
observed correlations between the MVs.
•
•Interrelationships between all possible MVs in a given domain can be explained by a limited number
of factors. The number of factors is considered to be (much) smaller than the number of MVs (if
this were not the case, we would gain very little by doing factor analysis)
à e.g., it is assumed that a limited number of mental abilities will explain relationships between
all ability tests
…no MV single-handedly represents a distinct ability or trait.

•
•
•
•Again, the factors influence the MVs. One of the objectives in FA is to estimate the degree of
these influences. We measure these by the means of factor loadings.
•
•The numerical values of factor loadings indicate the strength of the factor’s influence on the MV
(a zero indicates no influence). Factor loadings are equivalent to regression coefficients,
standing for the influence of a factor (independent variable) on a MV (dependent variable)
•
•The pattern of factor loadings helps us determine the nature of a factor
…in other words, a factor is defined by the subset of MVs that it substantially influences

•
•
•
To recap:
•Correlations between manifest variables exist because the manifest variables are influenced by one
or more of the same factors.
(e.g., text comprehension and verbal fluency are correlated because both are influenced by a common
underlying factor of verbal ability)
•
•
•Our aim is to determine the number and nature of the underlying factors and their pattern of
influence on the manifest variables.
•
•We want to obtain a simple explanation of relationships in the data using a small number of
factors.

Example
•Suppose we have scores from a sample of individuals on 4 performance measures: paragraph
comprehension, vocabulary, arithmetic skills, and mathematical problem solving. We get the
following correlation matrix:
•
•
PC
VO
AR
MPS
PC
1
VO
.49
1
AR
.14
.07
1
MPS
.48
.42
.48
1

Example
•We would like to identify the underlying factors to explain the correlations. Thus, we employ
factor analysis methods and obtain a factor loading matrix:
•
•
Factor 1
Factor 2
PC
.70
.10
VO
.70
.00
AR
.10
.70
MPS
.60
.60

Example
•
•
Factor 1
Factor 2
PC
.70
.10
VO
.70
.00
AR
.10
.70
MPS
.60
.60
•Elements in the matrix represent the linear influence of each factor on each measure.

•In this course, we will study methods that will allow us to obtain such interpretable factor
loading matrices.
•
•Keep in mind that we are using a model – a one which represents some hypothesized structure of
observed data. Any mathematical model is – at least to some extent – wrong and does not perfectly
correspond to reality. •A model that makes sense conceptually but does not fit reasonably well is
useless.
•A model that fits great but does not make sense is useless as well.
•
•A factor analysis is not applicable to just any data.

•In the world of factor analysis, situations differ regarding the existence of prior hypotheses /
knowledge about the number and nature of the factors:
•
•Exploratory (unrestricted) FA:
• We have little prior idea of how many and what kind of factors there are.
•Confirmatory (restricted) FA:
• We do have a hypothesis (or hypotheses) about the number and nature of factors.
•
• ...the underlying theoretical model is the same!
•

A bit of history
•Factor analysis began with the study of mental abilities
•
•Charles Spearman proposed the first factor model in 1904:
•Performance on any test is a function of two factors – a general ability factor (Spearman’s g)
common to all ability tests, and a specific ability factor relevant only to the specific test in
question.
•à The two-factor theory of intelligence
•Ability tests correlate because they all depend on the general factor.

A bit of history
•Burt and Vernon, on the other hand, proposed a hierarchical model of human abilities:
•The human mind is organized in a hierarchy of abilities.
•The general ability sits atop this hierarchy
•More specific abilities are located lower in the hierarchy

A bit of history
•The Common Factor Model of L. L. Thurstone became the most prominent approach to FA since the
1940s. Thurstone disagreed with both the notion of g and a hierarchy of abilities.
•According to Thurstone, MVs depend on two kinds of underlying factors:
•Common factors that are common to more than one MV
•Unique factors that influence only one MV. Unique factors do not explain correlations between MVs.
•The p manifest variables depend on m common factors and p unique factors, where m < p

The Common Factor Model
•Therefore, for a given set of p MVs, there are m+p factors
•Each unique factor has two components:
•Specific factor
•Error of measurement
•
•   …the specific factor represents systematic factors affecting only a particular MV. The error
component represents random error.

The Common Factor Model


The Common Factor Model


The Common Factor Model


The Common Factor Model


The Common Factor Model
•Important assumption: In the model, the unique factor scores for different MVs are assumed to be
uncorrelated over all persons. Therefore, all partial correlations between MVs, controlling for the
effect of the common factors, are assumed to be zero.
•In other words, correlations between MVs are only due to the common factors (that’s why they’re
called common)
•This assumption refers to the population
•
•
•

The Common Factor Model
•What factors are common and what factors are specific depends on the manifest variables in the
dataset. •If we change the set of MVs by introducing new MVs or deleting MVs, we can potentially
change specific factors into common factors, and so on.
•
•
•

The Common Factor Model
•The model is will always be wrong to some degree (it’s a model after all). What are some of the
ways the model could be wrong?
•
•1) The assumption of linearity – the MVs are specified as linear functions of factors. Nobody
really thinks the real world is perfectly linear.
•
•2) The number of common factors is generally assumed to be small (m << p).
In reality, there are probably many, many influences on a score. However, we hope to identify the
non-negligible ones.
•
•We should recognize the common factors will not perfectly explain the variation and covariation of
the manifest variables.
•
•
•

The Common Factor Model
•The model equation looks like a multiple regression equation.
•The manifest variables are dependent variables
•The factors are independent variables
•The factor loadings are regression weights / coefficients
•
•
•The factor analysis model is like a set of multiple linear regressions where the independent
variables are unobservable.
•
•
•