# Standard error in two-stage least squares

statistics
Author

Gibran Hemani

Published

July 6, 2023

## Background

How to calculate the 2sls SE that is returned by e.g. ivreg

``````library(ivreg)
library(dplyr)``````
``````
Attaching package: 'dplyr'``````
``````The following objects are masked from 'package:stats':

filter, lag``````
``````The following objects are masked from 'package:base':

intersect, setdiff, setequal, union``````
``````set.seed(1234)

# Simulate
n <- 1000
z <- matrix(rnorm(n*2), n)
u <- rnorm(n)
x <-  z %*% c(0.1, 0.2) + rnorm(n) + u
y <- u + x * 0.3 + rnorm(n)

d <- tibble(z[,1], z[,2], x, y)
names(d) <- c("z1", "z2", "x", "y")
``````# A tibble: 6 × 4
z1     z2    x[,1]  y[,1]
<dbl>  <dbl>    <dbl>  <dbl>
1 -1.21  -1.21  -1.72    -2.45
2  0.277  0.301  0.502   -0.763
3  1.08  -1.54  -0.00212 -0.306
4 -2.35   0.635  2.92     3.99
5  0.429  0.703  0.121   -0.918
6  0.506 -1.91  -1.73    -0.595``````

Estimate with ivreg

``summary(ivreg(y ~ x | z1 + z2, data=d))``
``````
Call:
ivreg(formula = y ~ x | z1 + z2, data = d)

Residuals:
Min       1Q   Median       3Q      Max
-4.77224 -0.95740 -0.02718  1.02122  4.52467

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.008573   0.047024  -0.182    0.855
x            0.223361   0.231282   0.966    0.334

Diagnostic tests:
df1 df2 statistic  p-value
Weak instruments   2 997    10.249 3.93e-05 ***
Wu-Hausman         1 997    10.298  0.00137 **
Sargan             1  NA     0.114  0.73617
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.48 on 998 degrees of freedom
Multiple R-Squared: 0.2276, Adjusted R-squared: 0.2268
Wald test: 0.9327 on 1 and 998 DF,  p-value: 0.3344 ``````

Manual estimation based on https://stats.stackexchange.com/questions/265780/calculation-of-iv-standard-errors-using-r

``````Pz <- z %*% solve(t(z) %*% z) %*% t(z)
bhat <- solve(t(x) %*% Pz %*% x) %*% t(x) %*% Pz %*% y
omega <- diag(n) * drop(var(y))
v <- solve(t(x) %*% Pz %*% x) %*% t(x) %*% Pz %*% omega %*% Pz %*% x %*% solve(t(x) %*% Pz %*% x)``````

Result

``bhat``
``````          [,1]
[1,] 0.2237422``````
``sqrt(v)``
``````          [,1]
[1,] 0.2631128``````

bhat matches, standard error seems off

Alternatively try a more standard approach - https://stats.stackexchange.com/questions/472144/how-to-manually-calculate-standard-errors-for-instrumental-variables

``````xhat <- z %*% solve(t(z) %*% z) %*% t(z) %*% x
bhat <- solve(t(xhat) %*% xhat) %*% t(xhat) %*% y
e <- y - x %*% bhat
C <- sum(e^2) / n * solve(t(xhat) %*% xhat)``````

Result

``bhat``
``````          [,1]
[1,] 0.2237422``````
``sqrt(C)``
``````          [,1]
[1,] 0.2310808``````

This matches ivreg more closely

``sessionInfo()``
``````R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
 en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
 stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 dplyr_1.1.2 ivreg_0.6-2

loaded via a namespace (and not attached):
 vctrs_0.6.2       cli_3.6.1         knitr_1.43        rlang_1.1.1
 xfun_0.39         Formula_1.2-5     car_3.1-2         generics_0.1.3
 jsonlite_1.8.5    glue_1.6.2        zoo_1.8-12        htmltools_0.5.5
 lmtest_0.9-40     fansi_1.0.4       rmarkdown_2.22    grid_4.3.0
 tibble_3.2.1      evaluate_0.21     carData_3.0-5     abind_1.4-5
 MASS_7.3-58.4     fastmap_1.1.1     lifecycle_1.0.3   yaml_2.3.7
 compiler_4.3.0    pkgconfig_2.0.3   htmlwidgets_1.6.2 rstudioapi_0.14
 lattice_0.21-8    digest_0.6.31     R6_2.5.1          tidyselect_1.2.0
 utf8_1.2.3        pillar_1.9.0      magrittr_2.0.3    tools_4.3.0      ``````