Background
Observed that GIANT BMI PRS associated with BMI in UKB with b=1.1 and in colorectal cancer cases with b=0.7. Can this be induced just from non-confounded collider bias of the variants becoming correlated withing the sample:
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
n <- 1000000
# 1. simulate population with 10 snps, and the snps have betas on BMI
# 2. BMI has an influence on X incidence
# 3. U has an influence on X incidence and X progression
G <- sapply (1 : 10 , function (i) rbinom (n, 2 , 0.3 ))
dim (G)
b_g_bmi <- rnorm (10 , 0 , 0.1 )
bmi <- G %*% b_g_bmi + rnorm (n, 0 , 1 )
u <- rnorm (n)
b_u_x <- 0.5
X <- rbinom (n, 1 , plogis (- 2 + 0.2 * bmi + b_u_x * u))
table (X)
b_u_p <- 0.5
P <- u * b_u_p + rnorm (n)
library (simulateGP)
bhat_controls <- gwas (bmi[X == 0 ], G[X == 0 , ])
summary (lm (bhat_controls$ bhat ~ b_g_bmi))
Call:
lm(formula = bhat_controls$bhat ~ b_g_bmi)
Residuals:
Min 1Q Median 3Q Max
-0.0026202 -0.0011998 0.0003991 0.0011478 0.0019200
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0005517 0.0005388 -1.024 0.336
b_g_bmi 0.9933208 0.0059206 167.774 1.78e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.001701 on 8 degrees of freedom
Multiple R-squared: 0.9997, Adjusted R-squared: 0.9997
F-statistic: 2.815e+04 on 1 and 8 DF, p-value: 1.782e-15
bhat_cases <- gwas (bmi[X == 1 ], G[X == 1 , ])
summary (lm (bhat_cases$ bhat ~ b_g_bmi))
Call:
lm(formula = bhat_cases$bhat ~ b_g_bmi)
Residuals:
Min 1Q Median 3Q Max
-0.006455 -0.003318 0.001232 0.002730 0.005289
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.000363 0.001386 -0.262 0.8
b_g_bmi 0.992180 0.015227 65.158 3.42e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.004376 on 8 degrees of freedom
Multiple R-squared: 0.9981, Adjusted R-squared: 0.9979
F-statistic: 4246 on 1 and 8 DF, p-value: 3.424e-12
cor_controls <- cor (G[X == 0 , ])
cor_all <- cor (G)
cor_cases <- cor (G[X == 1 ,])
mean (cor_controls[lower.tri (cor_controls)]^ 2 )
mean (cor_cases[lower.tri (cor_controls)]^ 2 )
mean (cor_all[lower.tri (cor_controls)]^ 2 )
a <- rnorm (1000 , mean= 0 )
pchisq (sum (a^ 2 ), 1000 , lower.tail = FALSE )
a <- rnorm (1000 , mean= 0.6 )
pchisq (sum (a^ 2 ), 1000 , lower.tail = FALSE )
a <- rnorm (1000 , sd= 0.1 )
pchisq (sum (a^ 2 ), 1000 , lower.tail = FALSE )
R version 4.4.3 (2025-02-28)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] simulateGP_0.1.3 dplyr_1.1.4
loaded via a namespace (and not attached):
[1] digest_0.6.37 utf8_1.2.4 R6_2.5.1 fastmap_1.2.0
[5] tidyselect_1.2.1 xfun_0.48 magrittr_2.0.3 glue_1.8.0
[9] tibble_3.2.1 knitr_1.48 pkgconfig_2.0.3 htmltools_0.5.8.1
[13] rmarkdown_2.27 generics_0.1.3 lifecycle_1.0.4 cli_3.6.3
[17] fansi_1.0.6 vctrs_0.6.5 compiler_4.4.3 tools_4.4.3
[21] pillar_1.9.0 evaluate_1.0.1 yaml_2.3.10 rlang_1.1.4
[25] jsonlite_1.8.9 htmlwidgets_1.6.4