Gibran Hemani’s lab book - Decomposing drug X side effect matrices

Background

Aim: Want to predict side effects using MR.

Have a matrix of side effects x drugs.

A = n_se X m_drugs

Each drug binds some genes

B = p_genes X m_drugs

MR of each gene on all traits

C = q_traits X p_genes

and a matrix linking trait terms to side effect terms

D = q_traits X n_se

Basic simulation

m=3 drugs
n=5 side effects
p=6 genes
q=10 traits

# gene x drug - e.g. based on binding affinities
B <- matrix(c(
    0, 1, 1,
    1, 0, 0,
    0, 1, 0,
    0, 0, 1,
    0, 0, 0,
    0, 1, 1
), 6, 3)

# trait x se - matches trait names to side effect terms
D <- matrix(c(
    1, 0, 0, 0, 0,
    1, 0, 0, 0, 0,
    0, 1, 0, 0, 0,
    0, 1, 0, 0, 0,
    0, 0, 1, 0, 0, 
    0, 0, 1, 0, 0,
    0, 0, 1, 0, 0,
    0, 0, 1, 0, 0,
    0, 0, 0, 1, 0,
    0, 0, 0, 1, 0
), 10, 5)

# True mapping of genes to side effects - we don't observe this
gse <- matrix(c(
    1, 0, 0, 0, 0, 0,
    1, 1, 0, 0, 0, 0,
    0, 0, 1, 0, 0, 0,
    0, 0, 0, 1, 1, 0,
    0, 0, 0, 0, 1, 0
), 5, 6)

# True drug x side effect matrix is generated from gene side effects by gene drug binding
A <- gse %*% B
A

     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    1    1    1
[3,]    1    1    1
[4,]    0    1    1
[5,]    1    0    0

We don’t actually see the gse matrix. If everything works as we hypothesise then the trait x gene matrix that we observe would follow:

C <- D %*% gse
C

      [,1] [,2] [,3] [,4] [,5] [,6]
 [1,]    1    0    0    0    0    0
 [2,]    0    1    0    0    1    0
 [3,]    0    1    0    0    1    1
 [4,]    0    0    1    0    0    0
 [5,]    0    0    0    0    0    0
 [6,]    1    0    0    0    0    0
 [7,]    0    1    0    0    1    0
 [8,]    0    1    0    0    1    1
 [9,]    0    0    1    0    0    0
[10,]    0    0    0    0    0    0

Now we have B, C and D. How do we get back to A? Need to invert D, which isn’t square so use Moore-Penrose pseudoinverse

library(pracma)
Ahat <- pinv(D) %*% C %*% B

Does the prediction match the true A?

cor(c(Ahat), c(A))

[1] 0.9279607

plot(Ahat, A)

Quite close - the pseudoinverse has failed to get some of the values correct. Alternative to using pseudoinverse is to just manually re-label trait names with side effect values

sessionInfo()

R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pracma_2.4.2

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.2 compiler_4.3.0    fastmap_1.1.1     cli_3.6.1        
 [5] tools_4.3.0       htmltools_0.5.5   rstudioapi_0.14   yaml_2.3.7       
 [9] rmarkdown_2.22    knitr_1.43        xfun_0.39         digest_0.6.31    
[13] jsonlite_1.8.7    rlang_1.1.1       evaluate_0.21