Multivariable "structural" models of development
Idea: Just as standard regression models allow prediction of a dependent variable on the basis of independent variables, structural models can allow a sequence of predictive steps from root ("exogeneous") through to highest-level variables. Although this kind of model seems to illuminate issues about factors that build up over the life course, there are strong criticisms of using such models to make claims about causes.
Guidelines for annotations
Notes and annotations from 2007 course
Initial notes from PT
The mini-lecture will come at the start of class 11, not the end of class 10.
Cases: Kendler et al. 2002 on pathways to depression in women: Notice the high R^2 and the way the authors tease out different kinds of pathways to depression from the model they fit to their data.
Freedman 2005 is a statistician who questions whether structural models can be thought of as causal models and tries hard to make his questioning accessible (i.e., with a minimum of technical language [not zero however]).
Ou's 2005 synthesis of pathways from pre-school programs to later outcomes: Notice the different kinds of networks Ou reviews in the literature before presenting her own analysis.
During the class, we might look first at Kendler's and Ou's diagrams,
then do Q&A on the technical aspects of path analysis and SEM primed by the notes below,
then work our way through Freedman's critique.
PT's first attempt at a non-technical introduction to path analysis and structural equation modeling (alternatives expositions welcome)
Path analysis is a data analysis technique that quantifies the relative contributions of variables (“path coefficients”) to the variation in a focal variable once a certain network of interrelated variables has been specified (Lynch & Walsh 1998, 823). Some of these contributions are direct and some mediated through other variables, i.e., indirect. Although some researchers interpret “contribution” in causal terms (e.g., Pearl 2000, 135 & 344-5), others criticize such an interpretation (e.g., Freedman 2005). Here, contribution refers neutrally to the term of an additive model fitted to data.
The conceptual starting point for path analysis is an additive regression model that associates the focal (“dependent”) variable with several other measured (“independent” or “exogenous”) variables.
(The vertical lines in these figures indicates that the separate horizontal lines are combined together.)
X1 ----|
X2 ----|----> Y
X3 ----|
Technically, the additive model is transformed by subtracting the mean from every term, squaring the expression (so it is an equation for the variance), and dividing by the variance of the focal (“dependent”) variable. The result is the “equation of complete determination,” with the regression coefficients being multiplied by the SD of the other “independent” variables and divided by the SD of the focal variable to arrive at the path coefficient.
The next step is to consider more than one focal, “endogenous” variable and networks of exogenous and endogenous variables that you have reason to think are associated with one another. Indeed, the focal variable of one regression may be among the variables associated with a second focal variable and so on. In the figure below X3 has a direct link with Y2 and an indirect one through Y1.
X1 ----|
X2 ----|----> Y1 -|--> Y2
X3 ----|------------|
The software (e.g., LISREL) can solve these linked regression equations, but it is up to you to compare the results using the network you specify with plausible (theoretically-justified) alternatives that may link exogenous, independent variables and endogenous variables differently. Unlike multiple regression, we do not arrive at our idea of what should be in the regression by adding or subtracting variables in some stepwise procedure.
Structural equation modeling extends path analysis to include latent (a.k.a. unmeasured) variables or “constructs.” These latent variables are sometimes the presumed real underlying variable of which the measured one is an imperfect marker. For example, birth weight at full term and the neonate
APGAR scores might be the measured variables but the model might include degree of fetal under-nutrition as a latent variable. Latent variables can also be constructed by the software in the same way that they are in factor analyses, namely, as economical (dimension-reducing) linear combinations of measured variables. Calling the networks of linked variables “structural” is meant to suggest that we can give the pathways causal interpretations, but SEM and path analysis has no trick that overcomes the problems that regression and factor analyses have in exposing causes.
This section is not needed for understanding the papers for this week. However, looking ahead to studies of heritability (part of week 12), a field in which path analysis originated, there are no measured variables except the observed focal variable (e.g., height). Path analysis can still be used if we convert the additive model on which any given Analysis of Variance (AOV or ANOVA) is based into an additive model of constructed variables that take the values of the contributions fitted to the first model. For example, in an agricultural evaluation trial of many varieties replicated one of more times in each of many locations, the AOV model is
Yijk = M +Vi +Lj +VLij +Eijk (eqn. 1)
where Yijk denotes the measured trait y for the ith variety in the jth location and kth replication;
M is a base level for the trait;
Vi is the contribution of the ith variety;
Lj is the contribution of the jth location;
VLij is an additional contribution from the i,jth variety-location combination—in statistical terms, the “variety-location-interaction” contribution; and
Eijk is a noise contribution adding to the trait measurement.
The path model equivalent to equation 1 is
Yx = M +Z1x +Z2x +Z3x +Ex (eqn. 2)
where
Y is the measured trait as before and x denotes the replicates
Z1x = Vi if x if a replicate of variety i, or 0 otherwise
Z2x = Lj if x if a replicate in location j, or 0 otherwise
Z3x = VLij if x if a replicate of variety i in location j, or 0 otherwise
Ex = Eijk where x is replicate k of variety i in location j
The path coefficients are then set to equal the square root of the ratio of the variance of the contribution (Vi, etc.) to the total variance for the trait (Y). The equation of complete determination becomes
1 = Sum (over w's) of variance (Zw) / var(Y) (eqn. 3)
where w denotes the different contributions in the Analysis of Variance model.
For the agricultural trial this equation might be written
1 = [var(V) + var(L) + var(VL) + var(E)] / var(Y) (eqn. 4)
where V = variance of the vi terms, etc.
In human studies the var(VL) is ignored or discounted [which I think is a problem, PT] and this is expressed as
1 = heritability + shared environmental effect + non-shared environmental effect (eqn. 5)
When the same trait is observed in two relatives, their separate path analyses can be linked in one network and the correlation between the relatives calculated (Lynch & Walsh 1998, 826)—provided it is assumed that the contributions (and path coefficients) apply to both and that the noise contributions are uncorrelated. If we have data on correlations for different kinds of relatives (e.g., identical vs. fraternal twins), we can estimate the relative size of the contributions in equations such as 4 and 5. That’s the crux of heritability studies.
References
Freedman, D. A. (2005). Linear statistical models for causation: A critical review. Encyclopedia of Statistics in the Behavioral Sciences. B. Everitt and D. Howell. Chichester, Wiley.
Lynch, M. and B. Walsh (1998). Genetics and Analysis of Quantitative Traits. Sunderland, MA, Sinauer.
Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge, Cambridge University Press.
http://en.wikipedia.org/wiki/Apgar_score
Annotations on common readings
Annotated additions by students
This week’s readings seemed to build on the previous week’s life-course theory, in that it focused on factors that accumulate over the life-course to effect and affect health and developmental outcomes.
Ou
This article addresses the aims and importance of early education programs in preparing students for academic achievement, as well as emotional, psychological and behavioral development among young children. Such programs also serve to prepare parents for their role in supporting academic engagement and achievement over their children’s early life-cycle. Comprised of a longitudinal study, this research sought to test four models (and associated theories) regarding the mechanisms by which early education programs have an impact. These four models, include: socialization, parent involvement, teacher expectations and cognitive. It was easy for me to conceptualize the early education programs as building blocks which had to be arranged just so, and firmly in place, to support other life events and interventions, in later years of childhood and adolescence. As the logic models illustrated in this article indicate, there are multiple variables that influence one another in complex ways, which influence child development and education. It’s beyond the scope of this annotation to attempt to explain the various pathways and mediators but this articles serves to underscore the importance of an early, comprehensive approach involving teachers, parents and early education specialists.
---
This study discusses pathways that could possibly illuminate the connection between participation in early intervention programs and later academic achievement using a sample from the Chicago Longitudinal Study, which is an investigation of high poverty neighborhoods in Chicago, and the low income children who reside there. An examination of the test on pathways of early intervention programs on educational attainment was given. There were five components taken from previous studies, they are “cognitive advantage, family support, social adjustment, motivational advantage, and school support.” These components were researched and they were found to be predictors of academic achievement at 22 years of age. Analyses of LISREL showed that the connection between participation in the Chicago Child-Parent Center (CPC) program in early childhood and later academic achievement was adequately predicted by cognitive advantage effects, and by family support and school support effects. The results showed that environmental factors like family and school, as well as individual attributes that may be affected by the intervention, serve important roles in forecasting academic outcomes. The conversation centered around how advocating for family/school partnerships, with a focus on family influences in early intervention programs, as well as other environmental influences, could maintain and improve the outcomes of early intervention and promote greater academic achievement in adulthood. (CH'09)
Rini et al
The focus of this research was on the impact prenatal stress and anxiety have on birth outcomes, namely adverse outcomes. The authors describe three categories of resources that mediate stress: degree of self-esteem, degree of optimism about one’s future, and perceived control over outcomes. It seems to me that these are likely resources which younger women, particularly lower-income women do not possess. This article was also of interest (and value) to me because the researchers created an index for these three categories of stress, an approach I will need to employ in my dissertation with respect to health status and outcomes. The findings indicate that such stress can impact the gestational age and birth weight of babies, however when controlling for all other factors, in and of itself – it does not. It seems rather a complex combination of factors pertaining to deficient resources, low social support and stress combine to produce adverse birth outcomes. As we have discussed several times in class, low birth weight may make these babies more susceptible to weight gain from high calorie foods, as they age over the life-course. The authors conclude that having access to resources, material and otherwise, has protective health effects.
Posted by Amy Helburn Nov. 18th
Psychological Adaptation and Birth Outcomes (Rini, 1999)
In this analysis, the authors tried to control for multiple factors that might have effects on adverse birth outcomes - preterm delivery and low birth weight. They had a small sample of 230 pregnant women, 120 Hispanic and 110 White, who received prenatal care for 3 years. Authors’ assumptions were that personal resources (operationalized as mastery, self-esteem and dispositional optimism), stress (operationalized as state anxiety, pregnancy-related anxiety, perceived chronic stress, and life event distress) and socio-cultural and socio-demographic factors (age, education, marital status, household Income, ethnicity, country of birth and years lived in the U.S.) could result in different birth outcomes. While there has been research that linked some mother’s characteristics with birth outcomes, SES in particular, the contribution of this article was that they brought personal and adaptive resources into the complex equation. Authors assumed beneficial role of personal resources, like self-esteem, mastery and optimism to serve as mediators that reduce stress, which would result in longer gestation. While they found that higher self-esteem, mastery and optimism are associated with perceived stress, they found no evidence that these personal resources have “buffering” role in lowering stress. Despite limitations of this study, small sample size in particular, this study has expanded the range of factors being considered beyond contextual determinants to include individual determinants, both on the length of pregnancy and on baby’s birth weight. That is the enlightening feature of the multivariate approach, however it brings the know concern of “explain everything, explain nothing” (DBJ’09)