Correlated SNPs

Author

Gibran Hemani

Published

August 12, 2022

MR type 1 error rate

One instrument for X and X has no influence on Y

library(TwoSampleMR)
TwoSampleMR version 0.5.6 
[>] New: Option to use non-European LD reference panels for clumping etc
[>] Some studies temporarily quarantined to verify effect allele
[>] See news(package='TwoSampleMR') and https://gwas.mrcieu.ac.uk for further details
library(simulateGP)

Attaching package: 'simulateGP'
The following objects are masked from 'package:TwoSampleMR':

    allele_frequency, contingency, get_population_allele_frequency
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
set.seed(12345)
map <- tibble(snp=1, af=0.5)
params_x <- generate_gwas_params(map=map, h2=0.01, S=-0.4, Pi=1)
params_y <- generate_gwas_params(map=map, h2=0.0, S=-0.4, Pi=1)
nid <- 100000
ss <- summary_set(
    beta_gx=params_x$beta,
    beta_gy=params_y$beta,
    af=params_x$af,
    n_gx=10000,
    n_gy=10000,
    n_overlap=0,
    cor_xy=0.5
)

Perform MR with single causal variant

mr(ss) %>% glimpse()
Analysing 'X' on 'Y'
Rows: 1
Columns: 9
$ id.exposure <chr> "X"
$ id.outcome  <chr> "Y"
$ outcome     <chr> "Y"
$ exposure    <chr> "X"
$ method      <chr> "Wald ratio"
$ nsnp        <dbl> 1
$ b           <dbl> -0.08648138
$ se          <dbl> 0.0892847
$ pval        <dbl> 0.3327436

Perform MR with causal variant + 100 correlated tag SNPs

ss2 <- ss[rep(1,100),] %>% mutate(SNP=1:100)
mr(ss2, method_list="mr_ivw") %>% glimpse()
Analysing 'X' on 'Y'
Rows: 1
Columns: 9
$ id.exposure <chr> "X"
$ id.outcome  <chr> "Y"
$ outcome     <chr> "Y"
$ exposure    <chr> "X"
$ method      <chr> "Inverse variance weighted"
$ nsnp        <int> 100
$ b           <dbl> -0.08648138
$ se          <dbl> 0.00892847
$ pval        <dbl> 3.457243e-22

Very small p-value - inflated type 1 error

GRS correlation performance

Tag SNPs are perfectly correlated with causal variant

sim <- function(nid=10000, nsnp=10)
{
  g <- matrix(0, nid, nsnp)
  g[,1] <- rnorm(nid)
  for(i in 2:nsnp)
  {
    g[,i] <- g[,1]
  }
  y <- g[,1] + rnorm(nid)
  summary(lm(y ~ g[,1]))
  grs <- rowSums(g)
  return(c(cor(y, g[,1])^2, cor(y, grs)^2))
}
sapply(1:10, function(i) sim()) %>% rowMeans() %>% tibble(method=c("Causal variant only", "GRS"), rsq=.)
# A tibble: 2 × 2
  method                rsq
  <chr>               <dbl>
1 Causal variant only 0.500
2 GRS                 0.500

GRS and single causal variant work the same as Jack showed.

Tag SNPs are imperfectly correlated with causal variant

sim <- function(nid=10000, nsnp=10)
{
  g <- matrix(0, nid, nsnp)
  g[,1] <- rnorm(nid)
  for(i in 2:nsnp)
  {
    g[,i] <- g[,i] + rnorm(nid, sd=0.5)
  }
  y <- g[,1] + rnorm(nid)
  summary(lm(y ~ g[,1]))
  grs <- rowSums(g)
  return(c(cor(y, g[,1])^2, cor(y, grs)^2))
}
sapply(1:10, function(i) sim()) %>% rowMeans() %>% tibble(method=c("Causal variant only", "GRS"), rsq=.)
# A tibble: 2 × 2
  method                rsq
  <chr>               <dbl>
1 Causal variant only 0.500
2 GRS                 0.156

Now the GRS doesn’t work well because it includes the variance of the SNP + noise that isn’t causally related to the trait.

\[ r^2 = \frac{cov(grs, y)^2}{var(grs) var(y)} \]

i.e. cov(grs,y) isn’t increasing, but (var(y)) is.

Checking

n <- 10000
nsnp <- 10
g <- matrix(0, n, nsnp)
g[,1] <- rnorm(n)
for(i in 2:nsnp)
{
  g[,i] <- g[,i] + rnorm(n, sd=0.5)
}
y <- g[,1] + rnorm(n)
grs <- rowSums(g)
cov(y, grs)
[1] 1.003715
cov(y, g[,1])
[1] 0.9788621
sd(grs)
[1] 1.805151
sd(g[,1])
[1] 0.996039
sessionInfo()
R version 4.2.1 Patched (2022-09-06 r82817)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.10      simulateGP_0.1.2  TwoSampleMR_0.5.6

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9        plyr_1.8.7        compiler_4.2.1    pillar_1.8.1     
 [5] iterators_1.0.14  tools_4.2.1       mr.raps_0.2       digest_0.6.29    
 [9] jsonlite_1.8.0    evaluate_0.16     lifecycle_1.0.3   tibble_3.1.8     
[13] lattice_0.20-45   pkgconfig_2.0.3   rlang_1.0.6       Matrix_1.4-1     
[17] foreach_1.5.2     DBI_1.1.3         cli_3.4.1         yaml_2.3.5       
[21] xfun_0.33         fastmap_1.1.0     stringr_1.4.1     knitr_1.40       
[25] generics_0.1.3    htmlwidgets_1.5.4 vctrs_0.5.1       tidyselect_1.1.2 
[29] glmnet_4.1-4      grid_4.2.1        nortest_1.0-4     glue_1.6.2       
[33] R6_2.5.1          fansi_1.0.3       survival_3.4-0    rmarkdown_2.16   
[37] purrr_0.3.4       magrittr_2.0.3    ellipsis_0.3.2    codetools_0.2-18 
[41] htmltools_0.5.3   splines_4.2.1     assertthat_0.2.1  shape_1.4.6      
[45] utf8_1.2.2        stringi_1.7.8