Rsq in simulations

Author

Gibran Hemani

Published

December 6, 2023

Background

What determines R2 between X and Y in a linear model under confounding

$$ Y=a+bX+Eb=cov(X,Y)/var(X)R=cov(X,Y)/[sd(X)sd(Y)]=bsd(x)/sd(y)

$$

In OLS the b will be different to the IV b if there is unmeasured confounding

$$ X=a+β1G+β2U+ϵY=a+b1X+b2U+EbOLS=cov(X,Y)/var(X)=cov(β1G+β2U,b1(β1G+β2U)+b2U)/var(X)=cov(β1G+β2U,b1β1G+b1β2U+b2U)/var(X)=[b1β12var(G)+(b1β22+b2β2)var(U)]/var(X)

$$

and

ROLS=bOLSsd(x)/sd(y)

therefore to get the OLS R2 of X,Y

R2=[b1β12var(G)+(b1β22+b2β2)var(U)sd(x)sd(y)]2

Note that

var(x)=sd(x)2=β12var(G)+β22var(U)+var(ϵ)var(y)=sd(y)2=b12var(X)+b22var(U)+var(E)

So ultimately if you want to fix R2 for different parameters of effects you should be able to scale var(ϵ) and var(E), the residual variances, according to these formulae.

By contrast the variance explained by the causal effect of X is

RIV,x,y2=b12var(X)/var(Y)

Check

set.seed(1)
b1 <- 0.2
b2 <- 3
beta1 <- 4
beta2 <- 5
n <- 10000
u <- rnorm(n)
g <- rnorm(n)
x <- u * beta2 + g * beta1 + rnorm(n)
y <- u * b2 + x * b1 + rnorm(n, sd=0)

Beta

cov(x, y)/var(x)
[1] 0.562716
summary(lm(y ~ x))$coef[2,1]
[1] 0.562716
( b1*beta1^2*var(g) + (b1*beta2^2 + b2*beta2) * var(u) ) / var(x)
[1] 0.5557103

Correlation

((( b1*beta1^2*var(g) + (b1*beta2^2 + b2*beta2) * var(u) ) / var(x)) * sd(x) / sd(y))^2
[1] 0.768337
cor(x, y)^2
[1] 0.7878316

sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.3 compiler_4.3.2    fastmap_1.1.1     cli_3.6.1        
 [5] tools_4.3.2       htmltools_0.5.7   yaml_2.3.7        rmarkdown_2.25   
 [9] knitr_1.45        jsonlite_1.8.7    xfun_0.41         digest_0.6.33    
[13] rlang_1.1.2       evaluate_0.23