Gibran Hemani’s lab book - MR Standard errors

Background

Is there a problem of standard errors in MR methods being overly precise?

Simulate summary data where the exposure has an influence of bxy on the outcome, but the SNPs may have independent effects on the outcome as well.

Simulate 10k SNPs, with 50% of them contributing to heritability.

The pleiotropy effect is that for X and Y, 50% of SNPs are selected independently to contribute to some heritability.

The causal effect is that the the bgx effects on have an additional effect on y of bgx * bxy.

bxy = 0, pleiotropy = 0
bxy = 0, pleiotropy = 0.4
bxy = 0.4, pleiotropy = 0
bxy = 0.4, pleiotropy = 0.4

For scenarios where bxy = 0, expect that false positive rate is appropriately controlled for IVW, weighted median, weighted mode.

set.seed(12345)
library(TwoSampleMR)

TwoSampleMR version 0.5.11 
[>] New: Option to use non-European LD reference panels for clumping etc
[>] Some studies temporarily quarantined to verify effect allele
[>] See news(package = 'TwoSampleMR') and https://gwas.mrcieu.ac.uk for further details

library(simulateGP)


Attaching package: 'simulateGP'

The following objects are masked from 'package:TwoSampleMR':

    allele_frequency, contingency, get_population_allele_frequency

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(ggplot2)
library(knitr)

sim <- function(nsnp, nid, plei, bxy, sim=1) {
    
    # Allele frequencies
    map <- arbitrary_map(runif(nsnp, 0.01, 0.99))

    # Effects for x
    paramsx <- generate_gwas_params(map=map, h2=0.4, S=-0.4, Pi=0.5)

    # Summary stats for x
    betax <- generate_gwas_ss(paramsx, nid=nid)

    # Effects for y
    paramsy <- generate_gwas_params(map=map, h2=plei, S=-0.4, Pi=0.5) %>% mutate(beta = beta + paramsx$beta * bxy)

    # summary stats for y
    betay <- generate_gwas_ss(paramsy, nid=nid)
    dat <- merge_exp_out(betax, betay) 

    
    bind_rows(
        # Analysis using thresholded instruments
        dat %>%
            filter(pval.exposure < 5e-8) %>%
            mr(., method_list=c("mr_ivw", "mr_weighted_median", "mr_weighted_mode")) %>%
            mutate(inst = "threshold"),
        # Analysis using all variants regardless of threshold
        dat %>%
            mr(., method_list=c("mr_ivw", "mr_weighted_median", "mr_weighted_mode")) %>%
            mutate(inst = "all"),
    ) %>%
        mutate(plei=plei, bxy=bxy, sim=sim)
}

sim(10000, 100000, 0.4, 0)

Analysing 'X' on 'Y'

Analysing 'X' on 'Y'

  id.exposure id.outcome outcome exposure                    method  nsnp
1           X          Y       Y        X Inverse variance weighted   346
2           X          Y       Y        X           Weighted median   346
3           X          Y       Y        X             Weighted mode   346
4           X          Y       Y        X Inverse variance weighted 10000
5           X          Y       Y        X           Weighted median 10000
6           X          Y       Y        X             Weighted mode 10000
              b           se      pval      inst plei bxy sim
1 -0.0132785411 1.862067e-02 0.4757792 threshold  0.4   0   1
2  0.0053904701 1.356861e-02 0.6911646 threshold  0.4   0   1
3  0.0325270703 4.133091e-02 0.4318272 threshold  0.4   0   1
4 -0.0089724040 1.006037e-02 0.3724701       all  0.4   0   1
5 -0.0003497783 8.313815e-03 0.9664413       all  0.4   0   1
6  4.6421866726 1.341177e+05 0.9999724       all  0.4   0   1

sim(10000, 100000, 0, 0.4)

Analysing 'X' on 'Y'
Analysing 'X' on 'Y'

  id.exposure id.outcome outcome exposure                    method  nsnp
1           X          Y       Y        X Inverse variance weighted   338
2           X          Y       Y        X           Weighted median   338
3           X          Y       Y        X             Weighted mode   338
4           X          Y       Y        X Inverse variance weighted 10000
5           X          Y       Y        X           Weighted median 10000
6           X          Y       Y        X             Weighted mode 10000
           b           se          pval      inst plei bxy sim
1  0.3646894 7.852256e-03  0.000000e+00 threshold    0 0.4   1
2  0.3515298 1.220235e-02 1.684991e-182 threshold    0 0.4   1
3  0.3313245 4.271241e-02  1.043069e-13 threshold    0 0.4   1
4  0.3242457 4.724237e-03  0.000000e+00       all    0 0.4   1
5  0.3190166 7.740202e-03  0.000000e+00       all    0 0.4   1
6 -2.2205335 2.014174e+03  9.991204e-01       all    0 0.4   1

params <- expand.grid(
    nsnp = 10000,
    nid = 100000,
    plei = c(0, 0.4),
    bxy = c(0, 0.4),
    sim = 1:100
)
res <- lapply(1:nrow(params), \(i) {
    do.call(sim, as.list(params[i,])) %>% suppressMessages()
})
res <- bind_rows(res)
save(res, file="res.RData")

Simulation results under bxy = 0

load(file="res.RData")
group_by(res, plei, bxy, method, inst) %>%
    summarise(
        mean_se = mean(se), 
        mean_beta = mean(b), 
        power = sum(pval < 0.05) / n(),
        minp = min(pval)
    ) %>% filter(bxy == 0) %>% kable()

`summarise()` has grouped output by 'plei', 'bxy', 'method'. You can override
using the `.groups` argument.

plei	method	inst	mean_se	mean_beta	power	minp
0.0	Inverse variance weighted	all	4.485900e-03	-0.0007847	0.01	0.0474438
0.0	Inverse variance weighted	threshold	7.827900e-03	-0.0006856	0.04	0.0245715
0.0	Weighted median	all	7.293300e-03	-0.0000214	0.00	0.0562561
0.0	Weighted median	threshold	1.140910e-02	-0.0000122	0.01	0.0359994
0.0	Weighted mode	all	3.678866e+05	3.0237440	0.00	0.9989783
0.0	Weighted mode	threshold	3.846430e-02	-0.0002435	0.00	0.1227612
0.4	Inverse variance weighted	all	9.991900e-03	0.0003727	0.08	0.0046424
0.4	Inverse variance weighted	threshold	1.808060e-02	-0.0015122	0.06	0.0133826
0.4	Weighted median	all	8.587800e-03	-0.0006092	0.04	0.0022668
0.4	Weighted median	threshold	1.339560e-02	-0.0013472	0.04	0.0137506
0.4	Weighted mode	all	1.069980e+06	7.1229065	0.00	0.9979213
0.4	Weighted mode	threshold	3.769040e-02	-0.0004433	0.01	0.0300066

Simulation results under bxy = 0.4

group_by(res, plei, bxy, method, inst) %>%
    summarise(
        mean_se = mean(se), 
        mean_beta = mean(b), 
        power = sum(pval < 0.05) / n(),
        minp = min(pval)
    ) %>% filter(bxy == 0.4) %>% kable()

`summarise()` has grouped output by 'plei', 'bxy', 'method'. You can override
using the `.groups` argument.

plei	bxy	method	inst	mean_se	mean_beta	power	minp
0.0	0.4	Inverse variance weighted	all	4.74690e-03	0.3203781	1	0.0000000
0.0	0.4	Inverse variance weighted	threshold	8.26810e-03	0.3607363	1	0.0000000
0.0	0.4	Weighted median	all	7.76120e-03	0.3170201	1	0.0000000
0.0	0.4	Weighted median	threshold	1.21785e-02	0.3478450	1	0.0000000
0.0	0.4	Weighted mode	all	5.44511e+05	-16.6974825	0	0.9988647
0.0	0.4	Weighted mode	threshold	4.15776e-02	0.3481614	1	0.0000000
0.4	0.4	Inverse variance weighted	all	1.01325e-02	0.3184288	1	0.0000000
0.4	0.4	Inverse variance weighted	threshold	1.83467e-02	0.3601333	1	0.0000000
0.4	0.4	Weighted median	all	8.99970e-03	0.2923377	1	0.0000000
0.4	0.4	Weighted median	threshold	1.41112e-02	0.3272884	1	0.0000000
0.4	0.4	Weighted mode	all	8.06599e+06	147.1660233	0	0.9984827
0.4	0.4	Weighted mode	threshold	4.08282e-02	0.3404088	1	0.0000000

Summary

Under no pleiotropy the false positive rate is controlled (actually median and mode are slightly over conservative)
Under pleiotropy the false positive rate is controlled for weighted mode but slightly inflated for weighted median
Not obvious that the bootstrap approach for obtaining standard errors here, which is used by weighted median and weighted mode, is performing particularly poorly.
This is only 100 replications so may be unstable but there isn’t something very obviously wrong here.

sessionInfo()

R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.45         ggplot2_3.4.2      dplyr_1.1.4        simulateGP_0.1.3  
[5] TwoSampleMR_0.5.11

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       cli_3.6.2         rlang_1.1.3       xfun_0.42        
 [5] generics_0.1.3    jsonlite_1.8.8    data.table_1.14.8 glue_1.7.0       
 [9] colorspace_2.1-0  plyr_1.8.8        htmltools_0.5.7   scales_1.2.1     
[13] fansi_1.0.6       rmarkdown_2.26    grid_4.3.3        munsell_0.5.0    
[17] evaluate_0.23     tibble_3.2.1      fastmap_1.1.1     yaml_2.3.8       
[21] lifecycle_1.0.4   compiler_4.3.3    Rcpp_1.0.12       htmlwidgets_1.6.3
[25] pkgconfig_2.0.3   digest_0.6.34     R6_2.5.1          tidyselect_1.2.0 
[29] utf8_1.2.4        pillar_1.9.0      magrittr_2.0.3    withr_3.0.0      
[33] gtable_0.3.3      tools_4.3.3