Standard error in two-stage least squares

statistics
Author

Gibran Hemani

Published

July 6, 2023

Background

How to calculate the 2sls SE that is returned by e.g. ivreg

library(ivreg)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
set.seed(1234)

# Simulate
n <- 1000
z <- matrix(rnorm(n*2), n)
u <- rnorm(n)
x <-  z %*% c(0.1, 0.2) + rnorm(n) + u
y <- u + x * 0.3 + rnorm(n)

d <- tibble(z[,1], z[,2], x, y)
names(d) <- c("z1", "z2", "x", "y")
head(d)
# A tibble: 6 × 4
      z1     z2    x[,1]  y[,1]
   <dbl>  <dbl>    <dbl>  <dbl>
1 -1.21  -1.21  -1.72    -2.45 
2  0.277  0.301  0.502   -0.763
3  1.08  -1.54  -0.00212 -0.306
4 -2.35   0.635  2.92     3.99 
5  0.429  0.703  0.121   -0.918
6  0.506 -1.91  -1.73    -0.595

Estimate with ivreg

summary(ivreg(y ~ x | z1 + z2, data=d))

Call:
ivreg(formula = y ~ x | z1 + z2, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-4.77224 -0.95740 -0.02718  1.02122  4.52467 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.008573   0.047024  -0.182    0.855
x            0.223361   0.231282   0.966    0.334

Diagnostic tests:
                 df1 df2 statistic  p-value    
Weak instruments   2 997    10.249 3.93e-05 ***
Wu-Hausman         1 997    10.298  0.00137 ** 
Sargan             1  NA     0.114  0.73617    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.48 on 998 degrees of freedom
Multiple R-Squared: 0.2276, Adjusted R-squared: 0.2268 
Wald test: 0.9327 on 1 and 998 DF,  p-value: 0.3344 

Manual estimation based on https://stats.stackexchange.com/questions/265780/calculation-of-iv-standard-errors-using-r

Pz <- z %*% solve(t(z) %*% z) %*% t(z)
bhat <- solve(t(x) %*% Pz %*% x) %*% t(x) %*% Pz %*% y
omega <- diag(n) * drop(var(y))
v <- solve(t(x) %*% Pz %*% x) %*% t(x) %*% Pz %*% omega %*% Pz %*% x %*% solve(t(x) %*% Pz %*% x)

Result

bhat
          [,1]
[1,] 0.2237422
sqrt(v)
          [,1]
[1,] 0.2631128

bhat matches, standard error seems off

Alternatively try a more standard approach - https://stats.stackexchange.com/questions/472144/how-to-manually-calculate-standard-errors-for-instrumental-variables

xhat <- z %*% solve(t(z) %*% z) %*% t(z) %*% x
bhat <- solve(t(xhat) %*% xhat) %*% t(xhat) %*% y
e <- y - x %*% bhat
C <- sum(e^2) / n * solve(t(xhat) %*% xhat)

Result

bhat
          [,1]
[1,] 0.2237422
sqrt(C)
          [,1]
[1,] 0.2310808

This matches ivreg more closely


sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.1.2 ivreg_0.6-2

loaded via a namespace (and not attached):
 [1] vctrs_0.6.2       cli_3.6.1         knitr_1.43        rlang_1.1.1      
 [5] xfun_0.39         Formula_1.2-5     car_3.1-2         generics_0.1.3   
 [9] jsonlite_1.8.5    glue_1.6.2        zoo_1.8-12        htmltools_0.5.5  
[13] lmtest_0.9-40     fansi_1.0.4       rmarkdown_2.22    grid_4.3.0       
[17] tibble_3.2.1      evaluate_0.21     carData_3.0-5     abind_1.4-5      
[21] MASS_7.3-58.4     fastmap_1.1.1     lifecycle_1.0.3   yaml_2.3.7       
[25] compiler_4.3.0    pkgconfig_2.0.3   htmlwidgets_1.6.2 rstudioapi_0.14  
[29] lattice_0.21-8    digest_0.6.31     R6_2.5.1          tidyselect_1.2.0 
[33] utf8_1.2.3        pillar_1.9.0      magrittr_2.0.3    tools_4.3.0