Rsq in simulations

Author

Gibran Hemani

Published

December 6, 2023

Background

What determines $$R^2$$ between X and Y in a linear model under confounding

\begin{aligned} Y &= a + bX + E \\ b &= cov(X, Y) / var(X) \\ R &= cov(X, Y) / [sd(X)sd(Y)] \\ &= b * sd(x) / sd(y) \end{aligned}

In OLS the b will be different to the IV b if there is unmeasured confounding

\begin{aligned} X &= a + \beta_1 G + \beta_2 U + \epsilon \\ Y &= a + b_1 X + b_2 U + E \\ b_{OLS} &= cov(X, Y) / var(X) \\ &= cov(\beta_1 G + \beta_2 U, b_1(\beta_1 G + \beta_2 U) + b_2 U)/var(X) \\ &= cov(\beta_1 G + \beta_2 U, b_1\beta_1 G + b_1\beta_2 U + b_2 U)/var(X) \\ &= [b_1\beta_1^2 var(G) + (b_1\beta_2^2 + b_2\beta_2) var(U)]/var(X) \\ \end{aligned}

and

$R_{OLS} = b_{OLS} sd(x)/sd(y)$

therefore to get the OLS $$R^2$$ of X,Y

$R^2 = \left [ \frac{b_1\beta_1^2 var(G) + (b_1\beta_2^2 + b_2\beta_2) var(U)}{sd(x) sd(y)} \right]^2$

Note that

\begin{aligned} var(x) &= sd(x)^2 = \beta_1^2var(G) + \beta_2^2var(U) + var(\epsilon) \\ var(y) &= sd(y)^2 = b_1^2var(X) + b_2^2var(U) + var(E) \\ \end{aligned}

So ultimately if you want to fix $$R^2$$ for different parameters of effects you should be able to scale $$var(\epsilon)$$ and $$var(E)$$, the residual variances, according to these formulae.

By contrast the variance explained by the causal effect of X is

$R^2_{IV, x,y} = b^2_1var(X) / var(Y)$

Check

set.seed(1)
b1 <- 0.2
b2 <- 3
beta1 <- 4
beta2 <- 5
n <- 10000
u <- rnorm(n)
g <- rnorm(n)
x <- u * beta2 + g * beta1 + rnorm(n)
y <- u * b2 + x * b1 + rnorm(n, sd=0)

Beta

cov(x, y)/var(x)
[1] 0.562716
summary(lm(y ~ x))\$coef[2,1]
[1] 0.562716
( b1*beta1^2*var(g) + (b1*beta2^2 + b2*beta2) * var(u) ) / var(x)
[1] 0.5557103

Correlation

((( b1*beta1^2*var(g) + (b1*beta2^2 + b2*beta2) * var(u) ) / var(x)) * sd(x) / sd(y))^2
[1] 0.768337
cor(x, y)^2
[1] 0.7878316

sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.3 compiler_4.3.2    fastmap_1.1.1     cli_3.6.1
[5] tools_4.3.2       htmltools_0.5.7   yaml_2.3.7        rmarkdown_2.25
[9] knitr_1.45        jsonlite_1.8.7    xfun_0.41         digest_0.6.33
[13] rlang_1.1.2       evaluate_0.23