set.seed(1)
<- 0.2
b1 <- 3
b2 <- 4
beta1 <- 5
beta2 <- 10000
n <- rnorm(n)
u <- rnorm(n)
g <- u * beta2 + g * beta1 + rnorm(n)
x <- u * b2 + x * b1 + rnorm(n, sd=0) y
Background
What determines \(R^2\) between X and Y in a linear model under confounding
$$ \[\begin{aligned} Y &= a + bX + E \\ b &= cov(X, Y) / var(X) \\ R &= cov(X, Y) / [sd(X)sd(Y)] \\ &= b * sd(x) / sd(y) \end{aligned}\]$$
In OLS the b will be different to the IV b if there is unmeasured confounding
$$ \[\begin{aligned} X &= a + \beta_1 G + \beta_2 U + \epsilon \\ Y &= a + b_1 X + b_2 U + E \\ b_{OLS} &= cov(X, Y) / var(X) \\ &= cov(\beta_1 G + \beta_2 U, b_1(\beta_1 G + \beta_2 U) + b_2 U)/var(X) \\ &= cov(\beta_1 G + \beta_2 U, b_1\beta_1 G + b_1\beta_2 U + b_2 U)/var(X) \\ &= [b_1\beta_1^2 var(G) + (b_1\beta_2^2 + b_2\beta_2) var(U)]/var(X) \\ \end{aligned}\]$$
and
\[ R_{OLS} = b_{OLS} sd(x)/sd(y) \]
therefore to get the OLS \(R^2\) of X,Y
\[ R^2 = \left [ \frac{b_1\beta_1^2 var(G) + (b_1\beta_2^2 + b_2\beta_2) var(U)}{sd(x) sd(y)} \right]^2 \]
Note that
\[ \begin{aligned} var(x) &= sd(x)^2 = \beta_1^2var(G) + \beta_2^2var(U) + var(\epsilon) \\ var(y) &= sd(y)^2 = b_1^2var(X) + b_2^2var(U) + var(E) \\ \end{aligned} \]
So ultimately if you want to fix \(R^2\) for different parameters of effects you should be able to scale \(var(\epsilon)\) and \(var(E)\), the residual variances, according to these formulae.
By contrast the variance explained by the causal effect of X is
\[ R^2_{IV, x,y} = b^2_1var(X) / var(Y) \]
Check
Beta
cov(x, y)/var(x)
[1] 0.562716
summary(lm(y ~ x))$coef[2,1]
[1] 0.562716
*beta1^2*var(g) + (b1*beta2^2 + b2*beta2) * var(u) ) / var(x) ( b1
[1] 0.5557103
Correlation
*beta1^2*var(g) + (b1*beta2^2 + b2*beta2) * var(u) ) / var(x)) * sd(x) / sd(y))^2 ((( b1
[1] 0.768337
cor(x, y)^2
[1] 0.7878316
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.3 compiler_4.3.2 fastmap_1.1.1 cli_3.6.1
[5] tools_4.3.2 htmltools_0.5.7 yaml_2.3.7 rmarkdown_2.25
[9] knitr_1.45 jsonlite_1.8.7 xfun_0.41 digest_0.6.33
[13] rlang_1.1.2 evaluate_0.23