Introduction to CFA PSY544 – Introduction to Factor Analysis Week 11 Introduction •So far, all we covered was exploratory factor analysis (EFA) – or “unrestricted” factor analysis • •We’ve covered a huge a chunk of stuff and you should be proud of yourself! You have all the knowledge you need to become master exploratory factor analysts. • •You’ve also managed to see me twice a week and not jump out of the window. Introduction •Today, we begin with the rest of the course, which will cover confirmatory factor analysis (CFA) – or “restricted” factor analysis. • •The difference between EFA and CFA lies in the incorporation of prior hypothesis about the factor structure into the model specification. • •In EFA, the analyst seeks to explore the number and nature of the major common factors. Rotation to simple structure is usually necessary. •In CFA, the analyst has a specific prior hypothesis about the number and nature of the major common factors. This hypothesis is directly incorporated into model specification. No rotation is involved. Introduction Introduction •It is no longer possible to obtain estimates of the factor loadings once the unique factor variances (or communalities) are estimated. • •All parameters in CFA have to be estimated simultaneously by numerically minimizing some discrepancy function. • •In effect, CFA tends to be slower than EFA, even though the number of estimated parameters is smaller. Software •Plethora of software exists for confirmatory factor analysis or – more generally – for structural equation modeling (of which FA is a special case) • •LISREL, EQS, Mplus, RAMONA, SePATH, Mx, AMOS… • •…meh. In this course, we will use R and the lavaan package. For all its quirks and a steep learning curve, it’s a modern piece of software that allows for great flexibility. Oh, and you don’t have to sell a kidney to work with it – it’s free. Exploratory (Unrestricted) Factor Analysis •As you already know, in EFA, there are typically no (solid) prior ideas about the number of the common factors or their nature (the position of zero loadings) • •Sure, the analyst might have *some* ideas about the variables being analyzed, these don’t need to be expressed nor they need to be correct. •If the analyst conducts a blind rotation (like Quartimax) of the estimated factors, they will never know if the failure to see non-zero loadings where expected is because their hypothesis is incorrect or whether the rotational criterion is inadequate for the given situation. Exploratory (Unrestricted) Factor Analysis •Also, in EFA, the decision when to stop is based heavily on the analyst’s judgement and the entire thing is largely data-driven rather than theory-driven. • •That’s fine, as long as it’s acknowledged as such. Confirmatory (Restricted) Factor Analysis •CFA should be used only when there is a solid prior hypothesis about the number and nature of the common factors. •It’s totally fine (actually preferable in a lot of cases) to have several competing hypotheses. • •The analyst must be able to specify the number and position of zero loadings before the analysis. After that, the corresponding models are fit to data and the degree of model-data fit is assessed, which suggests the extent to which the prior hypothesis fits the empirical reality. Confirmatory (Restricted) Factor Analysis •CFA is not a data-driven enterprise. It’s theory-driven. • •CFA can seduce you to use it in a data-driven way. That’s dangerous, because it can lead to “confirmatory” models that are merely statistical artifacts. • •Confirmatory model is still a model. As such, it is nothing but an approximation. Make sure to be just as cautious in this regard as you would be while performing EFA. Restrictions Restrictions •Restrictions usually represent aspects of a prior hypothesis and serve to represent that hypothesis. • •They do affect the implied correlation/covariance matrix, hence they do affect the fit of the model, and so are testable. • •Because CFA includes imposition of restrictions, confirmatory models typically result in worse fit to the data than exploratory models which are free of restrictions. The CFA model The CFA model The CFA model The CFA model The CFA model The CFA model The CFA model •How many degrees of freedom does our model have? • •Well, first of all, let’s count the number of parameters we are freely estimating. That’s six factor loadings, six unique variances, and one correlation between factors, a total of 13 parameters to estimate. • •Our data is a 6 x 6 correlation / covariance matrix, which has [6 * (6+1)]/2 = 21 unique elements – the number of degrees of freedom for the null model. •In our case, the DF number is 21 – 13 = 8 degrees of freedom. • • Path diagrams •Path diagrams are a standard way to communicate a CFA model • •Let’s spend some time on the basics. • • Path diagrams •Rectangles denote manifest variables • • • • • •Circles denote latent variables • Age Int Path diagrams •One-sided, linear arrows denote a regression path • • • • •Double-sided, curved arrows denote a correlation / covariance (pretend like the line is curved, OK?) • Age Int Age Height Path diagrams •A path diagram should contain as much model-related information as possible, ideally all of it • •Each arrow stands for a parameter, and so should be labeled with the value of that particular parameter • • Path diagrams http://tx.shu.edu.tw/~purplewoo/Literature/!DataAnalysis/confirmatory%20factor%20analysis-intro.fil es/path1.gif Path diagrams https://openi.nlm.nih.gov/imgs/512/347/3063185/PMC3063185_1477-7525-9-12-1.png Path diagrams Path diagrams • • •Certain kinds of software will allow you to specify model using only path diagrams (LISREL, for instance), while for some software, path diagrams are the only way to specify a model (AMOS, I think) – however, even in AMOS, the software will “translate” the information contained in the path diagram into the model matrices. • • • • Estimation •As previously with EFA, estimation of parameters in CFA follows the basic principles of minimum discrepancy estimation. • •We are looking for a vector of parameters for which the following is true: the model-implied covariance / correlation matrix has minimum “distance” from the observed covariance / correlation matrix (in other words, the discrepancy function value is at a minimum) • • • • Estimation •Conceptually speaking: • •Ordinary least squares (OLS) – simple summed differences between the observed and the model-implied matrices • •Generalized least squares (GLS) – the differences between the observed and the model-implied matrices are weighted by corresponding elements in the observed matrix (discrepancy in a larger element is penalized less than discrepancy in a smaller element) • • • • Estimation • • • •Of course, the constrained parameters are not being estimated. • • • •