If the individuals in my dataset are correlated with known correlation structure, how can I perform regression whilst accounting for that correlation structure?
Adding to the complication, in this case it is the h2 estimates of a set of traits being correlated against number of GWAS hits for a set of traits, where the traits are in some way correlated. So the estimates of e.g. h2 are estimated with sampling error, so to account for that perhaps need to parametric bootstrap.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(MASS)
Attaching package: 'MASS'
The following object is masked from 'package:dplyr':
select
library(mvtnorm)#' Regression with samples that are not independent#'#' For our analysis we're regressing estimates against estimates,#' and so it's a bit more complicated because the estimates each have an SE.#' For now let's just ignore that but I think if we don't account for it we#' will get a bit of regression dilution bias.#'#' @param x Vector of x values#' @param y Vector of y values#' @param rho correlation matrix#' @param se_x SE of x evalues (Ignore)#' @param se_y SE of y values (Ignore)#' @param nboot Number of bootstraps to get standard error (Ignore)#'#' @return#' @exportreg_nonind <-function(x, y, rho, se_x=NULL, se_y=NULL, nboot=NULL){ X <-cbind(rep(1, length(x)), x) rho_inv <-solve(rho)#beta <- solve(t(X) %*% rho_inv %*% X) %*% t(X) %*% rho_inv %*% y beta <-ginv(t(X)%*%rho_inv %*%X)%*%t(X)%*%rho_inv %*%y yhat <- X %*% beta yres <-as.numeric(y - yhat) se <-as.numeric((t(yres) %*% rho_inv %*% yres) / (length(x)-qr(X)$rank)) *solve(t(X) %*% rho_inv %*% X) se <-sqrt(diag(se))return(tibble(param=c("intercept", "slope"), beta=beta, se=se, pval=pnorm(abs(beta)/se, lower.tail=FALSE) ))# get standard error via parametric bootstrap# betaboot <- matrix(0, nboot, 2)# for(i in 1:nboot)# {# X[,2] <- rnorm(length(x), mean=x, sd=se_x)# Y <- rnorm(length(y), mean=y, sd=se_y)# betaboot[i,] <- solve(t(X) %*% rho_inv %*% X) %*% t(X) %*% rho_inv %*% Y %>% as.numeric()# }# se_boot <- apply(betaboot, 2, sd)}x <-runif(300)y <-runif(300)rho <-diag(300)rho[lower.tri(rho)] <-rnorm(sum(lower.tri(rho)), sd=0.01)rho[upper.tri(rho)] <-t(rho)[upper.tri(rho)]reg_nonind(x, y, rho, 10)