# Structural equation modeling

Structural equation modeling (SEM) is a statistical technique for building and testing statistical models, which are often causal models. It is a hybrid technique that encompasses aspects of confirmatory factor analysis, path analysis and regression, which can be seen as special cases of SEM.

SEM encourages confirmatory, rather than exploratory, modelling; thus, it is suited to theory testing, rather than theory development. It usually starts with a hypothesis, represents it as a model, operationalises the constructs of interest with a measurement instrument and tests the model. With an accepted theory or otherwise confirmed model, one can also use SEM inductively by specifying a model and using data to estimate the values of free parameters. Often the initial hypothesis requires adjustment in light of model evidence, but SEM is rarely used purely for exploration.

Among its strengths is the ability to model constructs as latent variables — variables which are not measured directly, but are estimated in the model from measured variables which are assumed to 'tap into' the latent variables. This allows the modeller to explicitly capture unreliability of measurement in the model, in theory allowing the structural relations between latent variables to be accurately estimated.

SEM is an extension of the general linear model that simultaneously estimates relationships between multiple independent, dependent and latent variables.

An alternative technique for specifying Structural Models using partial least squares has been implemented in software such as LVPLS (Latent Variable Partial Least Square) and PLSGraph. This is felt by some to better suit exploratory modelling.

More ambitiously, The TETRAD project aims to develop a way to automate the search for possible causal models from data.

##  Steps in performing SEM analysis

###  Model specification

Since SEM is a confirmatory technique, the model must be specified correctly based on the type of analysis that the modeller is attempting to confirm. There are usually two main parts to SEM: the structural model showing dependencies between latent and exogeneous variables, and the measurement model showing the relations between the latent variables and their indicators. Confirmatory factor analysis models, for example, contain only the measurement part; while linear regression can be viewed as an SEM that only has the structural part. Specifying the model delineates relationships between variables that are thought to be related (and therefore want to be 'free' to vary) and those relationships between variables that already have an estimated relationship, which can be gathered from previous studies (these relationships are 'fixed' in the model).

###  Estimation of free parameters

Parameter estimation is done comparing the actual covariance matrices representing the relationships between variables and the estimated covariance matrices of the best fitting model. This is obtained through numerical maximization of a fit criterion as provided by maximum likelihood, weighted least squares or asymptotically distribution-free methods.

This is best accomplished by using a specialized SEM analysis program, such as SPSS' AMOS, EQS, LISREL, Mplus, Mx, or SAS PROC CALIS. More information about SAS PROC CALIS:

###  Assessment of fit

Using an SEM analysis program, one can compare the estimated matrices representing the relationships between variables in the model to the actual matrices. Individual factors within the model are also examined within the estimated model in order to see how well the proposed model fits the driving theory.

###  Model modification

The model may need to be modified in order to maximize the fit, thereby estimating the most likely relationships between variables.

###  Interpretation and communication

The model is then interpreted and claims about the constructs are made based on the best fitting model.

Caution should always be taken when making claims of causality even when experimentation or time-ordered studies have been done. SEM is most commonly used with data collected at one time point through passive observation. Collecting data at multiple time points and using an experimental or quasi-experimental design can help rule out certain rival hypotheses but even a randomized experiment cannot rule out all such threats to causal inference. Good fit by a model consistent with one causal hypothesis does not rule out equally good fit by another model consistent with a different causal hypothesis. However careful research design can help distinguish such rival hypotheses.

###  Replication and revalidation

All model modifications should be replicated and revalidated before interpreting and communicating the results.

• Invariance
• Multiple group comparison
• Modeling growth
• Relations to other types of advanced models (multilevel models; item response theory models)
• Alternative estimation and testing techniques
• Robust inference
• Interface with survey estimation

##  References

Books
• Bartholomew, D J, and Knott, M (1999) Latent Variable Models and Factor Analysis Kendall's Library of Statistics, vol. 7. Arnold publishers, ISBN 0-340-69243-X
• Bollen, K A (1989). Structural Equations with Latent Variables. Wiley, ISBN 0-471-01171-1
• Bollen, K A, and Long, S J (1993) Testing Structural Equation Models. SAGE Focus Edition, vol. 154, ISBN 0-8039-4507-8
• Byrne, B. M. (2001) Structural Equation Modeling with AMOS - Basic Concepts, Applications, and Programming.LEA, ISBN 0-8058-4104-0
• Hoyle, R H (ed) (1995) Structural Equation Modeling: Concepts, Issues, and Applications. SAGE, ISBN 0-8039-5318-6
• Kaplan, D (2000) Structural Equation Modeling: Foundations and Extensions. SAGE, Advanced Quantitative Techniques in the Social Sciences series, vol. 10, ISBN 0-7619-1407-2
• Kline, R. B. (2005) Principles and Practice of Structural Equation Modeling. The Guilford Press, ISBN 1-57230-690-4de:Strukturgleichungsmodellierung