# Age interactions

GxE
Author

Gibran Hemani

Published

October 5, 2023

## Background

We can infer the age-specific effect of trait-variant association in various ways. How do they relate? e.g.

• Stratify individuals by age group and get the main effect
• Fit age as an interaction term
• 2-step linear model - get per-individual age slopes and intercepts, estimate genetic effects on slopes and parameters

Age isn’t a collider so stratifying shouldn’t introduce problems.

Approach - each individual has a growth curve with parameters that are genetically influenced. What happens when we age stratify using different methods?

``library(dplyr)``
``````
Attaching package: 'dplyr'``````
``````The following objects are masked from 'package:stats':

filter, lag``````
``````The following objects are masked from 'package:base':

intersect, setdiff, setequal, union``````
``````library(ggplot2)
library(lme4)``````
``Loading required package: Matrix``
``set.seed(1234)``

Simulate data

``````growth_curve2 <- function(x, phi1=20, phi2=-2.4, phi3=0.3) {
g <- phi1 / (1 + exp(-(phi2 + phi3 * x)))
g + rnorm(length(g), sd=g/(max(g)*10))
}

nid <- 50000

bg1 <- 5/2
bg2 <- 0.1/2
pg1 <- 0.3
pg2 <- 0.3

bmi <- lapply(1:nid, function(i) {
tibble(
id=i,
g1 = rbinom(1, 2, pg1),
g2 = rbinom(1, 2, pg2),
age=sample(0:50, 20, replace=FALSE),
value=growth_curve2(age, g1 * bg1 + rnorm(1, mean=23, sd=5), -2.4, g2 * bg2 + rnorm(1, mean=0.3, sd=0.1))
) %>% arrange(age)
}) %>% bind_rows()

str(bmi)``````
``````tibble [1,000,000 × 5] (S3: tbl_df/tbl/data.frame)
\$ id   : int [1:1000000] 1 1 1 1 1 1 1 1 1 1 ...
\$ g1   : int [1:1000000] 0 0 0 0 0 0 0 0 0 0 ...
\$ g2   : int [1:1000000] 1 1 1 1 1 1 1 1 1 1 ...
\$ age  : int [1:1000000] 3 4 5 8 13 14 15 20 21 25 ...
\$ value: num [1:1000000] 4.03 4.72 5.49 8.35 14.52 ...``````

Example of what simulated data looks like

``````ggplot(bmi %>% filter(id < 11), aes(x=age, y=value)) +
geom_point(aes(colour=as.factor(id))) +
geom_smooth(aes(colour=as.factor(id)), se=FALSE)``````
```geom_smooth()` using method = 'loess' and formula = 'y ~ x'``

Stratifying by age, and getting main effect in each stratum

``````o <- lapply(unique(bmi\$age), function(a) {
x <- subset(bmi, age == a) %>% slice_sample(n=1000)
print(dim(x))
r <- bind_rows(
summary(lm(value ~ g1, x))\$coef %>% as_tibble() %>% mutate(g=1) %>% slice_tail(n=1),
summary(lm(value ~ g2, x))\$coef %>% as_tibble() %>% mutate(g=2) %>% slice_tail(n=1)
)
names(r) <- c("beta", "se", "tval", "pval", "g")
r\$age <- a
return(r)
}) %>% bind_rows()``````
``````[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5
[1] 1000    5``````
``o``
``````# A tibble: 102 × 6
beta     se  tval     pval     g   age
<dbl>  <dbl> <dbl>    <dbl> <dbl> <int>
1 0.578 0.0753  7.68 3.88e-14     1     3
2 0.653 0.0750  8.71 1.28e-17     2     3
3 0.481 0.114   4.21 2.82e- 5     1     4
4 1.08  0.117   9.17 2.57e-19     2     4
5 1.13  0.164   6.91 8.44e-12     1     5
6 1.46  0.163   8.92 2.25e-18     2     5
7 1.59  0.257   6.17 9.71e-10     1     8
8 1.67  0.250   6.67 4.32e-11     2     8
9 2.52  0.286   8.81 5.51e-18     1    13
10 1.40  0.297   4.71 2.85e- 6     2    13
# ℹ 92 more rows``````
``````ggplot(o, aes(y=beta, x= age)) +
geom_point(aes(colour=g))``````

Makes sense because g1 influences the asymptote, and g2 influences the rate of growth.

Interaction with main effect - does it give the same thing?

``````oi1 <- bmi %>% group_by(id) %>%
slice_sample(n=1) %>%
{summary(lm(value ~ g1 * as.factor(age), .))\$coef}
oi2 <- bmi %>% group_by(id) %>%
slice_sample(n=1) %>%
{summary(lm(value ~ g2 * as.factor(age), .))\$coef}
oi1 <- bind_cols(p = rownames(oi1), oi1) %>% as_tibble %>% filter(grepl("g1:", p)) %>% mutate(age=gsub("g1\\:as.factor\\(age\\)", "", p) %>% as.numeric(), g=1)
oi2 <- bind_cols(p = rownames(oi2), oi2) %>% as_tibble %>% filter(grepl("g2:", p)) %>% mutate(age=gsub("g2\\:as.factor\\(age\\)", "", p) %>% as.numeric(), g=2)

ggplot(bind_rows(oi1, oi2), aes(y=Estimate, x=age)) +
geom_point(aes(colour=g))``````

Linear model for each individual

``````temp <- bmi %>% filter(id %in% 1:1000)
a <- lmer(value ~ age + (1+age|id), data=temp)``````
``````Warning in checkConv(attr(opt, "derivs"), opt\$par, ctrl = control\$checkConv, :
Model failed to converge with max|grad| = 0.0579551 (tol = 0.002, component 1)``````
``````Warning in checkConv(attr(opt, "derivs"), opt\$par, ctrl = control\$checkConv, : Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?``````
``````b <- ranef(a)
``````  (Intercept)          age
1 -3.11384947  0.164278524
2 -4.39460900  0.033832734
3  4.75739472 -0.012807851
4 11.70591067  0.013564999
5 -0.06871647 -0.050490788
6  5.65167436 -0.005632972``````
``````rbind(
summary(lm(b\$id[,1] ~ temp\$g1[!duplicated(temp\$id)]))\$coef[2,],
summary(lm(b\$id[,2] ~ temp\$g1[!duplicated(temp\$id)]))\$coef[2,],
summary(lm(b\$id[,1] ~ temp\$g2[!duplicated(temp\$id)]))\$coef[2,],
summary(lm(b\$id[,2] ~ temp\$g2[!duplicated(temp\$id)]))\$coef[2,]
)``````
``````        Estimate  Std. Error   t value     Pr(>|t|)
[1,]  1.05784708 0.194558484  5.437168 6.807184e-08
[2,]  0.02622162 0.004303497  6.093097 1.579452e-09
[3,]  1.30851566 0.195321406  6.699295 3.493596e-11
[4,] -0.02265172 0.004374294 -5.178370 2.706779e-07``````

### Summary

• Stratifying by age gives same result as fitting age as an interaction - provided that interaction includes main effects for snp and age
• Linear model still picks things up

## Experiment - interactions with a collider

If there is a binary variable that is a collider, does it make a difference if it is tested stratified or as an interaction term

note this isn’t an issue for age which can’t be a collider

``````n <- 10000
x <- rnorm(n)
y <- rnorm(n)
u <- x + y + rnorm(n)
C <- rbinom(n, 1, plogis(u))
summary(lm(y ~ x, subset=C==1))\$coef[2,1]``````
``[1] -0.1274305``
``summary(lm(y ~ x, subset=C==0))\$coef[2,1]``
``[1] -0.1197606``
``summary(lm(y ~ x))\$coef[2,1]``
``[1] 0.003735308``
``summary(lm(y ~ x * as.factor(C)))``
``````
Call:
lm(formula = y ~ x * as.factor(C))

Residuals:
Min      1Q  Median      3Q     Max
-3.6992 -0.6213 -0.0028  0.6146  3.3290

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)     -0.37280    0.01379 -27.034   <2e-16 ***
x               -0.11976    0.01423  -8.415   <2e-16 ***
as.factor(C)1    0.74257    0.01964  37.802   <2e-16 ***
x:as.factor(C)1 -0.00767    0.01993  -0.385      0.7
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9241 on 9996 degrees of freedom
Multiple R-squared:  0.1251,    Adjusted R-squared:  0.1248
F-statistic: 476.5 on 3 and 9996 DF,  p-value: < 2.2e-16``````
``summary(lm(y ~ x : as.factor(C)))``
``````
Call:
lm(formula = y ~ x:as.factor(C))

Residuals:
Min      1Q  Median      3Q     Max
-4.0158 -0.6645 -0.0068  0.6616  3.2348

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)     -0.006836   0.010499  -0.651    0.515
x:as.factor(C)0  0.009325   0.014769   0.631    0.528
x:as.factor(C)1 -0.001655   0.014488  -0.114    0.909

Residual standard error: 0.988 on 9997 degrees of freedom
Multiple R-squared:  4.045e-05, Adjusted R-squared:  -0.0001596
F-statistic: 0.2022 on 2 and 9997 DF,  p-value: 0.8169``````

## Summary

• interaction with a collider is a problem as expected

``sessionInfo()``
``````R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] lme4_1.1-33   Matrix_1.5-4  ggplot2_3.4.2 dplyr_1.1.2

loaded via a namespace (and not attached):
[1] gtable_0.3.3      jsonlite_1.8.7    compiler_4.3.0    Rcpp_1.0.10
[5] tidyselect_1.2.0  splines_4.3.0     scales_1.2.1      boot_1.3-28.1
[9] yaml_2.3.7        fastmap_1.1.1     lattice_0.21-8    R6_2.5.1
[13] labeling_0.4.2    generics_0.1.3    knitr_1.43        htmlwidgets_1.6.2
[17] MASS_7.3-58.4     tibble_3.2.1      nloptr_2.0.3      munsell_0.5.0
[21] minqa_1.2.5       pillar_1.9.0      rlang_1.1.1       utf8_1.2.3
[25] xfun_0.39         cli_3.6.1         mgcv_1.8-42       withr_2.5.0
[29] magrittr_2.0.3    digest_0.6.31     grid_4.3.0        rstudioapi_0.14
[33] lifecycle_1.0.3   nlme_3.1-162      vctrs_0.6.3       evaluate_0.21
[37] glue_1.6.2        farver_2.1.1      fansi_1.0.4       colorspace_2.1-0
[41] rmarkdown_2.22    tools_4.3.0       pkgconfig_2.0.3   htmltools_0.5.5  ``````