Gibran Hemani’s lab book - Sibling replication of Backman rare variants

Objective

Backman et al 2022 (https://www.nature.com/articles/s41586-021-04103-z) performed exome GWAS of 4k traits, and found 564 rare-variant - trait pairs. They corrected for 20 PCs from common variants and 20 PCs from rare variants.

Question: Is uncontrolled recent population stratification biasing these associations or even driving false positives?

We could use siblings to estimate the direct effects for these discovery associations. There would be two major hits to power, first only 20k of the 500k UK Biobank will be used. Second, power to detect direct effects using siblings is reduced. So we can’t conclude that these are false positives if they simply don’t replicate in the sibling analysis.

Instead, we can ask to what extent are the replication associations consistent with the original estimates, given the reduction in precision. i.e. What fraction are expected to replicate at some nominal threshold, and what fraction are observed to replicate. If there is a significant difference between the observed and expected replication rate, then that indicates uncorrected inflation in the discovery exome GWAS.

This analysis: Power calculation - how much shrinkage would there need to be before there is 80% power to detect a difference between the sibling estimate and the population estimate?

Read in Backman results

There were 564 rare-variant - trait pairs in Backman et al 2022. Of these 484 were for quantitative traits, restricting to those here for simplicity.

library(here)

here() starts at /Users/gh13047/repo/lab-book

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(ggplot2)
set.seed(12345)
dat <- read.csv("Book3.csv", stringsAsFactors=FALSE, header=TRUE) %>% 
  as_tibble() %>%
  mutate(se = (uci - beta)/1.96)
dat %>% glimpse()

Rows: 484
Columns: 14
$ Effect..95..CI.                               <chr> "-0.15 (-0.19, -0.11)", …
$ P.value                                       <dbl> 9.25e-12, 1.24e-24, 1.03…
$ Effect.direction                              <chr> "-", "-", "-", "+", "-",…
$ N.cases.with.0.1.2.copies.of.effect.allele    <chr> "88038|1684|13", "418277…
$ N.controls.with.0.1.2.copies.of.effect.allele <chr> "NA|NA|NA", "NA|NA|NA", …
$ Effect.allele.frequency                       <dbl> 9.5e-03, 2.1e-04, 4.9e-0…
$ AA                                            <int> 88038, 418277, 407938, 3…
$ AB                                            <int> 1684, 177, 4027, 4076, 1…
$ BB                                            <int> 13, 0, 11, 1, 8, 0, 1, 0…
$ N                                             <int> 89735, 418454, 411976, 3…
$ beta                                          <dbl> -0.150, -0.720, -0.100, …
$ lci                                           <chr> "-0.19", "-0.86", "-0.13…
$ uci                                           <dbl> -0.110, -0.580, -0.079, …
$ se                                            <dbl> 0.020408163, 0.071428571…

plot(-log10(dat$P.value), abs(dat$beta)/dat$se)

Functions from Alex for relative sample size between designs:

unrel_vs_sib = function(h2){
  return(1+h2*(1-h2)/(4-h2))
}

unrel_vs_sib_inv = function(h2){
  return((1+h2*(1-h2)/(4-h2))^(-1))
}

unrel_vs_trio = function(h2){
  return(1+h2*(1-h2)/(3-h2*(1+h2/2)))
}

unrel_vs_trio_inv = function(h2){
  return((1+h2*(1-h2)/(3-h2*(1+h2/2)))^(-1))
}

unrel_vs_direct_trio = function(h2){return(0.5/3)}
unrel_vs_direct_sibdiff = function(h2){return((2*(2-h2))^(-1))}
unrel_vs_direct_sibimp = function(h2){return((2+h2/2)/(6*(1-(h2/2)^2)))}

Function to estimate expected replication rate

prop_overlap <- function(b_disc, b_rep, se_disc, se_rep, alpha)
{
  p_sign <- pnorm(-abs(b_disc) / se_disc) * pnorm(-abs(b_disc) / se_rep) + ((1 - pnorm(-abs(b_disc) / se_disc)) * (1 - pnorm(-abs(b_disc) / se_rep)))
  p_sig <- pnorm(-abs(b_disc) / se_rep + qnorm(alpha / 2)) + (1 - pnorm(-abs(b_disc) / se_rep - qnorm(alpha / 2)))
  p_rep <- pnorm(abs(b_rep)/se_rep, lower.tail=FALSE)
  res <- tibble::tibble(
    nsnp=length(b_disc),
    metric=c("Sign", "Sign", "P-value", "P-value"),
    datum=c("Expected", "Observed", "Expected", "Observed"),
    value=c(sum(p_sign, na.rm=TRUE), sum(sign(b_disc) == sign(b_rep)), sum(p_sig, na.rm=TRUE), sum(p_rep < alpha, na.rm=TRUE))
  )
  pdif <- list(
    Sign = binom.test(res$value[2], res$nsnp[1], res$value[1] / res$nsnp[1])$p.value,
    `P-value` = binom.test(res$value[4], res$nsnp[1], res$value[3] / res$nsnp[1])$p.value
  ) %>% as_tibble()
  return(list(res=res, pdif=pdif, variants=dplyr::tibble(sig=p_sig, sign=p_sign)))
}

Function to simulate assocs in siblings

expected_se <- function (beta, af, n, vy) 
{
    sqrt(c(vy) - beta^2 * 2 * af * (1 - af))/sqrt(n) * (1/sqrt(2 * 
        af * (1 - af)))
}

expected_vy <- function(beta, af, n, se)
{
  se^2 * 2 * af * (1 - af)*n + beta^2 * 2 * af * (1 - af)
}

# check algebra!
# expected_se(0.2, 0.4, 1000, 10)
# expected_vy(0.2, 0.4, 1000, expected_se(0.2, 0.4, 1000, 10))

simgwas_sib <- function(beta_disc, se_disc, af_disc, n_disc, prop_sibs, h2, alpha, shrinkage)
{
  d <- tibble(
    beta_disc = beta_disc,
    beta_true = beta_disc * shrinkage,
    vy = expected_vy(beta_disc, af_disc, n_disc, se_disc),
    n_rep = n_disc * prop_sibs * unrel_vs_direct_sibimp(h2),
    se_rep = expected_se(beta_disc, af_disc, n_rep, vy),
    beta_rep = rnorm(length(beta_disc), beta_true, se_rep)
  ) 
  d %>%
    {prop_overlap(.$beta_disc, .$beta_rep, se_disc, .$se_rep, alpha)}
}

Power analysis

params <- expand.grid(sim=1:500, shrinkage=seq(0,1,0.05))
res <- lapply(1:nrow(params), function(i) 
  simgwas_sib(dat$beta, dat$se, dat$Effect.allele.frequency, dat$N, 20000/500000,0.5, 0.05/nrow(dat), params$shrinkage[i])
  ) %>%
  lapply(., function(x) x$pdif) %>% bind_rows() %>% bind_cols(., params)
res

# A tibble: 10,500 × 4
        Sign `P-value`   sim shrinkage
       <dbl>     <dbl> <int>     <dbl>
 1 1.04e- 91  2.33e-11     1         0
 2 3.19e- 95  2.33e-11     2         0
 3 7.72e- 91  2.33e-11     3         0
 4 4.13e- 89  2.33e-11     4         0
 5 3.52e-105  2.33e-11     5         0
 6 6.78e- 98  2.33e-11     6         0
 7 1.85e- 93  2.33e-11     7         0
 8 4.14e- 96  2.33e-11     8         0
 9 1.76e-118  9.03e-10     9         0
10 9.63e-112  2.33e-11    10         0
# … with 10,490 more rows

res %>% 
  group_by(shrinkage) %>%
  summarise(Sign=sum(Sign < 0.05)/n(), `P-value`=sum(`P-value` < 0.05)/n()) %>%
  tidyr::gather(key="key", value="value", c(`Sign`, `P-value`)) %>%
  ggplot(., aes(x=shrinkage, y=value)) +
    geom_point(aes(colour=key)) +
    geom_line(aes(colour=key)) +
    labs(x="Shrinkage of effect size, or 1-FDR", y="Power to detect difference from expectation")

Summary

Power is quite high to detect a difference!

sessionInfo()

R version 4.2.1 Patched (2022-09-06 r82817)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.4.0 dplyr_1.0.10  here_1.0.1   

loaded via a namespace (and not attached):
 [1] pillar_1.8.1      compiler_4.2.1    tools_4.2.1       digest_0.6.31    
 [5] jsonlite_1.8.4    evaluate_0.19     lifecycle_1.0.3   tibble_3.1.8     
 [9] gtable_0.3.1      pkgconfig_2.0.3   rlang_1.0.6       cli_3.5.0        
[13] DBI_1.1.3         yaml_2.3.6        xfun_0.36         fastmap_1.1.0    
[17] withr_2.5.0       stringr_1.5.0     knitr_1.41        generics_0.1.3   
[21] vctrs_0.5.1       htmlwidgets_1.5.4 rprojroot_2.0.3   grid_4.2.1       
[25] tidyselect_1.2.0  glue_1.6.2        R6_2.5.1          fansi_1.0.3      
[29] rmarkdown_2.16    farver_2.1.1      tidyr_1.2.1       purrr_1.0.0      
[33] magrittr_2.0.3    ellipsis_0.3.2    scales_1.2.1      htmltools_0.5.4  
[37] assertthat_0.2.1  colorspace_2.0-3  labeling_0.4.2    utf8_1.2.2       
[41] stringi_1.7.8     munsell_0.5.0