David Evans University of Queensland Using SEM to partition genetic effects of individual SNPs into maternal and fetal components David Evans University of Queensland
Post-doctoral Position in Statistical Genetics/Genomics
Objectives Illustrate the flexibility of SEM Show how SEM can be used to investigate molecular mechanisms Illustrate model building Illustrate the concept of identification Revise key concepts from the week
Birthweight GWAS (EGG and UKBB) Nicole Warrington UKBB and EGG consortium Birthweight GWAS reflects a mixture of maternal and fetal genetic effects Unrelated individuals*
Conditional Analysis of Genotyped Mother-Offspring Duos Conditional Regression: BWi = βmSNPmi + βcSNPci + εi ϵSNP1 1 SNPm βm Structural Equation Model: 0.5 βc 1 ϵSNP2 SNPc BW 1 ϵ BUT- not many cohorts in the world with these data! Are twins suitable for these analyses?
Disentangling Mother and Child Effects on Birth Weight in UKBB UKBB contains self-reported birthweight and reported birthweight of first offspring Φ GM m 0.5 SNP = ½GM + εSNP BW 1 ϵ c 0.75Φ 1 ϵSNP SNP BW = mGM + cSNP + ε ρ m BWO 1 BWO = cGO + mSNP + εO ϵO 0.5 c 0.75Φ GO
Tracing Rules of Path Analysis Find All Distinct Chains between Variables: Go backwards along zero or more single-headed arrows Change direction at one and only one Double-headed arrow Trace forwards along zero or more Single-headed arrows Multiply path coefficients in a chain Sum the results of step 2 For covariance of a variable with itself (Variance), chains are distinct if they have different paths or a different order
Building The Model: Path Tracing Rules SNP BW BWO Φ cΦ + ½mΦ mΦ + ½cΦ m2Φ + c2Φ + mcΦ + var(ε) ½ m2Φ + ½c2Φ + ¼mcΦ + mcΦ + ρ m2Φ + c2Φ + mcΦ + var(εO) Φ GG m 0.5 BW 1 ϵ c 0.75Φ 1 ϵSNP SNP ρ m BWO 1 ϵO 0.5 Note that variances for birthweight are going to be the same (assuming the same measurement error) Variance of SNPs (observed and latent) all constrained to be equal c 0.75Φ GO
Building the Model: Covariance Algebra SNP = ½GG + εSNP BW = mGG + cSNP + ε BWO = cGO + mSNP + εO Φ GG m 0.5 BW 1 ϵ c 0.75Φ 1 ϵSNP SNP ρ (1) cov(cX, Y) = c x cov(X, Y) (2) cov(X + Y, Z) = cov(X, Z) + cov(Y, Z) (3) cov(X, X) = var(X) m BWO 1 ϵO 0.5 c 0.75Φ GO var(SNP)= cov (SNP, SNP) = cov(½GG + εSNP , ½GG + εSNP ) = cov(½GG , ½GG) + cov(½GG , εSNP ) + cov(εSNP ,½GG) + cov( εSNP , εSNP ) = ¼cov(GG , GG) + 0 + 0 + cov( εSNP , εSNP ) = ¼var(GG) + var( εSNP) = Φ
Building the Model: Covariance Algebra SNP = ½GG + εSNP BW = mGG + cSNP + ε BWO = cGO + mSNP + εO Φ GG m 0.5 BW 1 ϵ c 0.75Φ 1 ϵSNP SNP ρ (1) cov(cX, Y) = c x cov(X, Y) (2) cov(X + Y, Z) = cov(X, Z) + cov(Y, Z) (3) cov(X, X) = var(X) m BWO 1 ϵO 0.5 c 0.75Φ GO cov(BW, SNP)= cov(mGG + cSNP + ε , ½GG + εSNP ) = cov(mGG,½GG) + cov(mGG,εSNP) + cov(cSNP,½GG) + cov(cSNP,εSNP) + cov(ε,½GG) + cov(ε,εSNP) = ½mcov(GG,GG) + mcov(GG,εSNP) + ½ccov(SNP,GG) + ccov(SNP,εSNP) + ½cov(ε,GG) + cov(ε,εSNP) = ½mvar(GG) + 0 + ½c ½Φ + c¾Φ + 0 + 0 = ½mΦ + cΦ
Σ = Σ(θ) Understanding SEM Σ = Σ(θ) Observed Sample Covariance Matrix Expected Covariance Matrix Expected covariance matrix a function of model parameters Parameters chosen to minimize the difference between observed and expected covariance matrices
Identification Means that all parameters in a model can be estimated uniquely given the data A necessary (but not sufficient condition) for identifiability is that you have the same (or more) observed statistics than parameters you want to estimate If all parameters in a model are identified, then the model as a whole is identified Even though the model as a whole may be unidentified some parameters may be identified
Identified or Not? (1) θ1 + θ2 = 10 (2) θ1 + θ2 = 10 θ1 - θ2 = 0 (3) θ1 + θ2 = 10 2θ1 +2θ2 = 20
Identification in Twin Models OBSERVED EXPECTED VARMZ-T1 VA + VC + VE ΣMZ = Σ(θ)MZ = COVMZ VARMZ-T2 VA + VC VA + VC + VE VARDZ-T1 VA + VC + VE Σ DZ = Σ(θ)DZ = COVDZ VARDZ-T2 ½VA + VC VA + VC + VE How many observed statistics? Why can’t we model VA, VC, VD, VE How many parameters?
Identified or Not? How many observed statistics? How many parameters? SNP BW BWO Φ cΦ + ½mΦ mΦ + ½cΦ m2Φ + c2Φ + mcΦ + var(ε) ½ m2Φ + ½c2Φ + ¼mcΦ + mcΦ + ρ m2Φ + c2Φ + mcΦ + var(εO) Φ GG m 0.5 BW 1 ϵ c 0.75Φ ϵSNP 1 SNP ρ m BWO 1 ϵO 0.5 c 0.75Φ GO How many observed statistics? How many parameters?
Identified or Not? Φ Φ = var(SNP) SNP BW BWO Φ cΦ + ½mΦ mΦ + ½cΦ m2Φ + c2Φ + mcΦ + var(ε) ½ m2Φ + ½c2Φ + ¼mcΦ + mcΦ + ρ m2Φ + c2Φ + mcΦ + var(εO) Φ GG m 0.5 BW 1 ϵ c 0.75Φ ϵSNP 1 SNP ρ m BWO 1 ϵO 0.5 c 0.75Φ GO Φ = var(SNP)
Identified or Not? - c and m SNP BW BWO Φ cΦ + ½mΦ mΦ + ½cΦ m2Φ + c2Φ + mcΦ + var(ε) ½ m2Φ + ½c2Φ + ¼mcΦ + mcΦ + ρ m2Φ + c2Φ + mcΦ + var(εO) Φ GG m 0.5 BW 1 ϵ c 0.75Φ ϵSNP 1 SNP ρ m BWO 1 ϵO 0.5 c 0.75Φ GO cΦ + ½mΦ = cov(BW, SNP) mΦ + ½cΦ = cov(BWO,SNP)
Identified or Not? var(ε) SNP BW BWO Φ cΦ + ½mΦ mΦ + ½cΦ m2Φ + c2Φ + mcΦ + var(ε) ½ m2Φ + ½c2Φ + ¼mcΦ + mcΦ + ρ m2Φ + c2Φ + mcΦ + var(εO) Φ GG m 0.5 BW 1 ϵ c 0.75Φ ϵSNP 1 SNP ρ m BWO 1 ϵO 0.5 c 0.75Φ GO m2Φ + c2Φ + mcΦ + var(ε) = var(BW)
Identified or Not? var(εO) SNP BW BWO Φ cΦ + ½mΦ mΦ + ½cΦ m2Φ + c2Φ + mcΦ + var(ε) ½ m2Φ + ½c2Φ + ¼mcΦ + mcΦ + ρ m2Φ + c2Φ + mcΦ + var(εO) Φ GG m 0.5 BW 1 ϵ c 0.75Φ ϵSNP 1 SNP ρ m BWO 1 ϵO 0.5 c 0.75Φ GO m2Φ + c2Φ + mcΦ + var(εO) = var(BWO)
½ m2Φ + ½c2Φ + ¼mcΦ + mcΦ + ρ = cov(BW, BWO) Identified or Not? ρ SNP BW BWO Φ cΦ + ½mΦ mΦ + ½cΦ m2Φ + c2Φ + mcΦ + var(ε) ½ m2Φ + ½c2Φ + ¼mcΦ + mcΦ + ρ m2Φ + c2Φ + mcΦ + var(εO) Φ GG m 0.5 BW 1 ϵ c 0.75Φ ϵSNP 1 SNP ρ m BWO 1 ϵO 0.5 c 0.75Φ GO ½ m2Φ + ½c2Φ + ¼mcΦ + mcΦ + ρ = cov(BW, BWO)
Can we go further?
Disentangling Mother and Child Effects on Birth Weight SNP SBP SBPm Φ GG m 0.5 SBPm ϵm c 0.75Φ Gm ρ m SBP ϵ 0.5 c 0.75Φ SNP
Disentangling Mother and Child Effects on Birth Weight SNP SBP SBPm Φ cΦ + ½mΦ ½cΦ + ¼ mΦ m2Φ + c2Φ + mcΦ + var(ε) ½ c2Φ + ½m2Φ + mcΦ + ¼ mcΦ + ρ m2Φ + c2Φ + mcΦ + var(εm) Φ GG m 0.5 SBPm ϵm c 0.75Φ Gm ρ m SBP ϵ 0.5 c 0.75Φ SNP
Identified or Not? How many observed statistics? How many parameters? SNP SBP SBPm Φ cΦ + ½mΦ ½cΦ + ¼ mΦ m2Φ + c2Φ + mcΦ + var(ε) ½ c2Φ + ½m2Φ + mcΦ + ¼ mcΦ + ρ m2Φ + c2Φ + mcΦ + var(εm) Φ GG m 0.5 SBPm ϵm c 0.75Φ Gm ρ m SBP ϵ 0.5 c 0.75Φ SNP How many observed statistics? How many parameters?
Identified or Not? Φ SNP SBP SBPm Φ cΦ + ½mΦ ½cΦ + ¼ mΦ m2Φ + c2Φ + mcΦ + var(ε) ½ c2Φ + ½m2Φ + mcΦ + ¼ mcΦ + ρ m2Φ + c2Φ + mcΦ + var(εm) Φ GG m 0.5 SBPm ϵm c 0.75Φ Gm ρ m SBP ϵ 0.5 c 0.75Φ SNP
Identified or Not? c,m? SNP SBP SBPm Φ cΦ + ½mΦ ½cΦ + ¼ mΦ m2Φ + c2Φ + mcΦ + var(ε) ½ c2Φ + ½m2Φ + mcΦ + ¼ mcΦ + ρ m2Φ + c2Φ + mcΦ + var(εm) Φ GG m 0.5 SBPm ϵm c 0.75Φ Gm ρ m SBP ϵ 0.5 c 0.75Φ SNP
Intuition To estimate maternal effects, we need individuals with observed genotypes who have reported their offspring’s phenotype In the first situation where we examine birthweight we have this In the second situation where we examine blood pressure we do not
Questions?