# Gene-environment equivalence

Author

Gibran Hemani

Published

March 15, 2023

## Background

Latest PRS for height explains ~40% of the variance. Heritability ~0.7. Height has a negative association with coronary heart disease. Height is fixed after adolescence, which means it can only be influenced by a) genetic factors; b) early life exposures; c) stochasticity.

Question: Is the influence of height on CHD different when the modifier is genetic vs non-genetic?

Analysis: Estimate the association of height PRS on CHD, and then residualise height for the PRS and determine the residual height association with CHD.

## Model

### Gene environment equivalence

Under gene-environment equivalence, the following model:

$y_i = \alpha + \beta_{xy} x_i + \epsilon_i$

where $$y_i$$ is the outcome (e.g. CHD), $$\beta_{xy}$$ is the causal effect of the exposure on the outcome and

$x_i = a + \sqrt{h^2} g_i + \sqrt{1-h^2} e_i$

where $$x_i$$ is the exposure (height) in the $$i$$th individual, $$h^2$$ is the heritability of height, $$g_i$$ is the (perfectly measured) genetic value for individual $$i$$ and $$e_i$$ is the non-genetic component of height. There is no confounding in this simple model. What is the expected association of g on y?

\begin{aligned} \beta_{gy} &= \frac{cov(y, g)}{var(g)} \\ &= \frac{cov(\beta_{xy}x, g)}{ var(g) } \\ &= \frac{cov(\beta_{xy}\sqrt{h^2}g, g)}{ var(g) } \\ &= \frac{\beta_{xy}\sqrt{h^2}var(g)}{ var(g) } \\ &= \beta_{xy}\sqrt{h^2} \end{aligned}

So this is expected and the basic result for MR. What is the expected association of x on y after it has been residualised for g? First get the residual assuming perfect information of $$g$$

\begin{aligned} \hat{e_i} &= x_i - \hat{\sqrt{h^2}}g_i \\ &\approx \sqrt{1-h^2}e_i \end{aligned}

Now find the association of $$\hat{e}_i$$ with $$y_i$$

\begin{aligned} \beta_{\hat{e}y} &= \frac{cov(\hat{e}, y)}{var(\hat{e})} \\ &= \frac{cov(\sqrt{1-h^2}e, \beta_{xy}(\sqrt{1-h^2}e)}{(1-h^2)var(e)} \\ &= \frac{\beta_{xy}(1-h^2)var(e)}{(1-h^2)var(e)} \\ &= \beta_{xy} \end{aligned}

RESULT: The residual height association with y is expected to be equal to the raw height association with y.

### Gene environment non-equivalence

Now allow the genetic and non-genetic components of height to have independent influences on CHD.

$x_i = \sqrt{h^2}g_i + \sqrt{1-h^2}e_i$

as before, and now

$y_i = \beta_g g_i + \beta_e e_i + \epsilon_i$

Expected association of x on y

\begin{aligned} \beta_{xy} &= \frac{cov(\sqrt{h^2}g_i + \sqrt{1-h^2}e_i,\beta_g g_i + \beta_e e_i)}{var(x)} \\ &= \frac{\sqrt{h^2}\beta_g var(g) + \sqrt{1-h^2}\beta_e var(e)}{h^2 var(g) + (1-h^2) var(e)} \\ \end{aligned}

Assuming $$var(g)=var(e)=1$$ this reduces to

$\beta_{xy} = \sqrt{h^2}\beta_g + \sqrt{1-h^2}\beta_e$

Now what is the expected association of x on y after it has been residualised for g?

\begin{aligned} \beta_{\hat{e}y} &= cov(\hat{e}, y) / var(\hat{e}) \\ &= \frac{cov(\sqrt{1-h^2}e, \beta_g g + \beta_e e)}{(1-h^2)var(e)} \\ &= \frac{\sqrt{1-h^2}\beta_evar(e)}{(1-h^2)var(e)} \\ &= \sqrt{\frac{(1-h^2)\beta_e^2 var(e)^2}{(1-h^2)^2 var(e)^2}} \\ &= \frac{\beta_e}{\sqrt{1-h^2}} \end{aligned}

## Check with simulations

### Gene environment equivalence

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

filter, lag
The following objects are masked from 'package:base':

intersect, setdiff, setequal, union
library(ggplot2)
n <- 100000
bxy <- 0.3
h2 <- 0.7
g <- rnorm(n)
e <- rnorm(n)
x <- sqrt(h2) * g + sqrt(1-h2) * e
y <- x * bxy + rnorm(n)

Get residual

xres <- residuals(lm(x ~ g)) # sim
xres_exp <- sqrt(1-h2) * e # expected
# check they're the same
cor(xres, xres_exp)
 0.9999938
lm(xres ~ xres_exp)$coef  xres_exp 0.9999876  Assoc of x on y should be lm(y ~ x)$coef # sim
        x
0.3002834 
bxy # expected
 0.3

Expected assoc of g on y

lm(y ~ g)$coef # sim  g 0.2523371  sqrt(h2)*bxy # expected  0.250998 Expected assoc of residual x on y lm(y ~ xres)$coef # sim
     xres
0.2988134 
bxy
 0.3

### Gene environment non-equivalence

n <- 100000
bg <- 0.3
be <- 0.6
h2 <- 0.7
g <- rnorm(n)
e <- rnorm(n)
x <- sqrt(h2) * g + sqrt(1-h2) * e
y <- g * bg + e * be + rnorm(n)

Get residual

xres <- residuals(lm(x ~ g)) # sim
xres_exp <- sqrt(1-h2) * e # expected
# check they're the same
cor(xres, xres_exp)
 0.9999998
lm(xres ~ xres_exp)$coef  xres_exp 0.9999996  Association of xres and y lm(y ~ xres)$coef # sim
    xres
1.086526 
be / sqrt(1-h2)
 1.095445

Assoc of x and y - simulation

lm(y ~ x)$coef # sim  x 0.5760577  bg * sqrt(h2) + be * sqrt(1-h2) # exp  0.5796315 ## Incomplete adjustment of g Does incomplete adjustment of g change things much? ### Gene environment equivalence n <- 100000 bxy <- 0.3 h2 <- 0.7 g_exp <- 0.5 # proportion of g explained by prs g_explained <- rnorm(n) * sqrt(g_exp) g_unexplained <- rnorm(n) * sqrt(1-g_exp) g <- g_explained + g_unexplained e <- rnorm(n) x <- sqrt(h2) * g + sqrt(1-h2) * e y <- x * bxy + rnorm(n) Get residual - now this is different from expected above due to residual including some unadjusted genetic variance, but linearly still related xres <- residuals(lm(x ~ g_explained)) # sim xres_exp <- sqrt(1-h2) * e # expected # check they're the same cor(xres, xres_exp)  0.6786454 lm(xres ~ xres_exp)$coef
xres_exp
1.001644 

What is the PRS (explained g) assoc with y?

lm(y ~ g_explained)$coef g_explained 0.2557102  Assoc of height with y lm(y ~ x)$coef
        x
0.3021427 

And residual with y

lm(y ~ xres)$coef  xres 0.3004327  So the residual x effect remains identical to the raw effect of x. ## In progress… (ignore for now) Does confounding change things much? n <- 100000 bxy <- 0.3 bu <- 0.3 h2 <- 0.7 u <- rnorm(n) g <- rnorm(n) e <- rnorm(n) x <- sqrt(h2) * g + sqrt(1-h2) * e + u * bu y <- x * bxy + rnorm(n) + u * bu y ~ x lm(y ~ x)$coef
        x
0.3807293 

y ~ x_res

xres <- residuals(lm(x ~ g)) # sim
xres_exp <- sqrt(1-h2) * e # expected
# check they're the same
cor(xres, xres_exp)
 0.8756121
lm(xres ~ xres_exp)$coef  xres_exp 0.9991385  lm(y ~ xres)$coef
     xres
0.5229343 
lm(y ~ xres_exp)$coef  xres_exp 0.2933105  lm(y ~ g)$coef
        g
0.2532887 
testdat <- function(y, x, g)
{
xres <- residuals(lm(x ~ g))
tribble(~model, ~beta,
"y ~ x", lm(y ~ x)$coef, "y ~ g", lm(y ~ g)$coef,
"y ~ xres", lm(y ~ xres)$coef ) } No confounding, gene environment equivalence n <- 100000 bxy <- -0.3 h2 <- 0.7 g <- rnorm(n) e <- rnorm(n) x <- sqrt(h2) * g + sqrt(1-h2) * e y <- x * bxy + rnorm(n) testdat(y, x, g) # A tibble: 3 × 2 model beta <chr> <dbl> 1 y ~ x -0.306 2 y ~ g -0.256 3 y ~ xres -0.300 n <- 100000 buy <- 0.3 bux <- -0.3 bxy <- 0.3 h2 <- 0.7 u <- rnorm(n) g <- rnorm(n) e <- rnorm(n) x <- sqrt(h2) * g + sqrt(1-h2) * e + bux * u y <- x * bxy + rnorm(n) + buy * u testdat(y, x, g) # A tibble: 3 × 2 model beta <chr> <dbl> 1 y ~ x 0.217 2 y ~ g 0.254 3 y ~ xres 0.0633 n <- 100000 buy <- 0.3 bux <- -0.3 bg <- -0.2 be <- -0.3 h2 <- 0.7 u <- rnorm(n) g <- rnorm(n) e <- rnorm(n) x <- sqrt(h2) * g + sqrt(1-h2) * e + bux * u y <- g * bg + e * be + rnorm(n) + buy * u testdat(y, x, g) # A tibble: 3 × 2 model beta <chr> <dbl> 1 y ~ x -0.392 2 y ~ g -0.205 3 y ~ xres -0.655 param <- expand.grid( buy = seq(-0.3, 0.3, by=0.3), bux = seq(-0.3, 0.3, by=0.3), bg = seq(-0.3, 0.3, by=0.3), be = seq(-0.3, 0.3, by=0.3) ) res <- lapply(1:nrow(param), function(i) { n <- 100000 buy <- param$buy[i]
bux <- param$bux[i] bg <- param$bg[i]
be <- param\$be[i]
h2 <- 0.7
u <- rnorm(n)
g <- rnorm(n, sd=sqrt(h2))
e <- rnorm(n, sd=sqrt(1-h2))
x <- g + e + bux * u
y <- g * bg + e * be + rnorm(n) + buy * u
testdat(y, x, g) %>% bind_cols(param[i,])
})
res <- bind_rows(res)
res %>% mutate(ge = bg-be) %>%
filter(bg == -0.3) %>%
#filter(round(bg, 1) == -0.2, be < 0, round(buy, 1) %in% c(-0.3, 0, 0.3), round(bux, 1) %in% c(-0.3, 0, 0.3)) %>%
ggplot(., aes(x=beta, y=ge)) +
geom_point(aes(colour=model)) +
geom_line(aes(colour=model)) +
scale_colour_brewer(type="qual") n <- 100000
bux <- 0
bg <- -0.4
be <- -0.2
h2 <- 0.5
u <- rnorm(n)
g <- rnorm(n, sd=sqrt(h2))
e <- rnorm(n, sd=sqrt(1-h2))
x <- g + e
y <- g * bg + e * be + rnorm(n)
testdat(y, x, g)
# A tibble: 3 × 2
model      beta
<chr>     <dbl>
1 y ~ x    -0.301
2 y ~ g    -0.397
3 y ~ xres -0.205
library(simulateGP)
g <- make_geno(1000, 100, 0.3)
b <- rnorm(100)
prs <- g %*% b
ve <- var(prs) / 0.7 - var(prs)
e <- rnorm(1000, sd=sqrt(ve))
x <- prs + e
cor(x, prs)^2
          [,1]
[1,] 0.6871713
var(prs) / var(x)
          [,1]
[1,] 0.6877084