library(MASS)
<- 20
m <- matrix(runif(m^2)*2-1, ncol=m)
A <- t(A) %*% A
sigma <- mvrnorm(1000, rep(0, m), sigma) gen
Background
Need to be able to calculate genetic principal components in unrelateds and then project into relateds.
Simulate
Generate in full sample
<- princomp(gen)
pcafull names(pcafull)
[1] "sdev" "loadings" "center" "scale" "n.obs" "scores" "call"
Generate in 90%
<- princomp(gen[1:900,]) pca90
Project into remaining 10%
<- gen[901:1000,] %*% pca90$loadings pca10
correlate full with projected
diag(cor(pca10, pcafull$scores[901:1000,]))
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
0.9979686 0.9869467 0.9829454 0.9843170 0.9927274 0.9905525 0.9942163 0.9960611
Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16
0.9994964 0.9965984 0.9981390 0.9818166 0.9838703 0.9984871 0.9972718 0.9968159
Comp.17 Comp.18 Comp.19 Comp.20
0.9982290 0.9993551 0.9991197 0.9989832
This works fine. Looks like GCTA does something similar here: https://yanglab.westlake.edu.cn/software/gcta/#PCloadingandprojection
sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] MASS_7.3-58.4
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.2 compiler_4.3.0 fastmap_1.1.1 cli_3.6.1
[5] tools_4.3.0 htmltools_0.5.5 rstudioapi_0.14 yaml_2.3.7
[9] rmarkdown_2.22 knitr_1.43 xfun_0.39 digest_0.6.31
[13] jsonlite_1.8.4 rlang_1.1.1 evaluate_0.21