Fitting the Common Factor Model PSY544 – Introduction to Factor Analysis Week 5 The Common Factor Model The Common Factor Model The Common Factor Model The Common Factor Model • • •You learned a LOT already! • •More than a majority of factor analysis practitioners know about the model J (pretty sad, huh?) Fitting the model •One important thing to note is that these models are intended for a population – they are population models, describing how stuff works in a population. • •Anyway, at the beginning we learned that there are two sides to factor analysis – theory and methodology. •What we have covered so far was the theory (the model itself) •Now, we will focus on the methodology (how to fit the model on data / how to estimate the unknowns in the model) Fitting the model •More specifically, we will focus on the theoretical basis for fitting the model. Later on in the course, we will cover the actual thing in practice (software and examples). • •A model represents some hypothesized structure of data. Different methods are available for fitting the model to data and obtaining estimates of model parameters (the elements in model matrices) and providing us with information on how well the model fits the data. Fitting the model •For the sake of argument, we will consider the hypothetical scenario where the population correlation matrix P is known, and the model holds exactly in the population (i.e., the model explains P perfectly) • •This will never ever be the case in practice, but it’s a better starting point to begin understanding the principles. • •Later, we will drop these assumptions, no worries. The population correlation matrix Rotational indeterminacy Rotational indeterminacy Rotational indeterminacy Rotational indeterminacy Rotational indeterminacy •Let’s look at an example, using the example data we have seen before. • •The matrix P is given as follows: • • • • • The communality problem The communality problem •Many solutions were suggested to the communality problem. • •The one that “won” (was and is the most widely used) was suggested by Louis Guttman in 1940. • •Guttman suggested squared multiple correlations (SMCs) as the initial approximations to communalities. The communality problem •Just what is a squared multiple correlation (SMC)? • •Imagine you have p manifest variables. You can try to predict the j-th manifest variable from the other (p - 1) manifest variables, linear regression-style. • •This prediction will be imperfect. You can correlate these predicted values of the j-th manifest variable with the actual values of the variable. What you will get is a correlation coefficient, the multiple correlation coefficient. Square it and you get the SMC. The communality problem The communality problem •However, in order to obtain the population SMCs, we need to know P in the first place. Most often, we don’t. • •In practice, we can apply the same procedure to a sample correlation matrix, R, in order to obtain sample SMCs. Since, in reality, we usually work with sample correlation matrices, let’s slowly shift the gear towards thinking more about a sample correlation matrix R and less about the population correlation matrix, P. Working with a sample correlation matrix •So far, we have studied factor analysis limiting ourselves to the ideal scenario in which we know the population correlation matrix, P. Moreover, we only considered the case where the model holds exactly in the population. • •Now, let’s consider the real world in which we do not have access to P but we do have access to R. In this real world scenario, we are not even sure the sample correlation matrix R is drawn from a population with a correlation matrix P for which the model holds. • •As before, let’s just consider the uncorrelated / orthogonal model for now. Working with a sample correlation matrix Working with a sample correlation matrix Working with a sample correlation matrix •Every element in the residual matrix tells us how far is the model-implied (predicted) value of this element from its observed value. • •Alright, so – again, we don’t have a population correlation matrix P which we used for all the computations and methods covered before. What are we going to do? • •Of course, we’re going to pretend like the problem isn’t there and we’ll start by doing things in the exact same way. Working with a sample correlation matrix Working with a sample correlation matrix •Again, we will obtain some eigenvalues and some eigenvectors. However, in this case (not having a population correlation matrix, not being sure the model holds exactly in the population), we will generally not obtain an eigen-solution where the (p – m) smallest eigenvalues are zero. • •Thus, we cannot rely on the number of non-zero eigenvalues to show us the “true” number of factors (m). Thus, we will have to choose m ourselves beforehand, based on our best judgement (more on that later) • • Working with a sample correlation matrix Working with a sample correlation matrix Working with a sample correlation matrix Example Example Example Example Example Example Example Example Example •The solution produced a residual matrix with minimum sum of squares, conditional on the prior communality estimates. If the prior communality estimates would be different, a different residual matrix would satisfy the RSS criterion. • Short review Iterative procedure Iterative procedure Iterative procedure Iterative procedure •That’s really all there is (in principle) about OLS. • •By the way, the RSS function (the formula we have seen before) is a discrepancy function – it quantifies the distance between the observed and model-implied correlation matrices. In other words, it expresses the degree of lack of model fit. • •Being a discrepancy function, it is always greater than or equal to zero and is zero only when the observed and model-implied correlation matrices are the same. • • • Heywood cases •One nasty thing can happen when using OLS estimation • •That is, some communalities can, in the course of the iterations, be greater than one. Conversely, the unique variances can become less than zero (because in a standardized solution, the communality and the unique variance of an MV add up to one) • •But there’s no such thing as negative variance. Thus, such a solution would be nonsensical and unacceptable. We call these occurrences Heywood cases Heywood cases •If you’re using smart software, you should be notified whenever a Heywood case occurs • •If you’re using smart software, it can help you circumvent the problem by placing a constraint on the associated unique variance such that it can only be greater than or equal to zero. Summary Summary Summary Summary