LD induced by assortative mating

Author

Gibran Hemani

Published

April 16, 2024

Background

Positive LD is predicted by assortative mating? https://www.youtube.com/watch?v=6m-mLvNQxJA

Assortative mating induces a form of selection bias in that of all possible mating pairs, the observed pairs are not a random sample, they are those that are more similar to each other than would be expected by chance. And therefore the children of those pairs inherit alleles that have not been randomly sampled.

Though this is not a straightforward form of selection where e.g. a higher phenotypic value leads to more chance of inclusion in the population, in which case you would expect negative correlation amongst the SNPs influencing the trait.

Simulate families with assortative mating and calculate LD between SNPs.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
make_families <- function(af, nfam) {
    nsnp <- length(af)
    dads <- matrix(0, nfam, nsnp)
    mums <- matrix(0, nfam, nsnp)
    sibs1 <- matrix(0, nfam, nsnp)
    sibs2 <- matrix(0, nfam, nsnp)
    ibd <- matrix(0, nfam, nsnp)
    ibs <- matrix(0, nfam, nsnp)
    for(i in 1:nsnp)
    {
        dad1 <- rbinom(nfam, 1, af[i]) + 1
        dad2 <- (rbinom(nfam, 1, af[i]) + 1) * -1
        mum1 <- rbinom(nfam, 1, af[i]) + 1
        mum2 <- (rbinom(nfam, 1, af[i]) + 1) * -1

        dadindex <- sample(c(TRUE, FALSE), nfam, replace=TRUE)
        dadh <- rep(NA, nfam)
        dadh[dadindex] <- dad1[dadindex]
        dadh[!dadindex] <- dad2[!dadindex]

        mumindex <- sample(c(TRUE, FALSE), nfam, replace=TRUE)
        mumh <- rep(NA, nfam)
        mumh[mumindex] <- mum1[mumindex]
        mumh[!mumindex] <- mum2[!mumindex]

        sib1 <- cbind(dadh, mumh)

        dadindex <- sample(c(TRUE, FALSE), nfam, replace=TRUE)
        dadh <- rep(NA, nfam)
        dadh[dadindex] <- dad1[dadindex]
        dadh[!dadindex] <- dad2[!dadindex]

        mumindex <- sample(c(TRUE, FALSE), nfam, replace=TRUE)
        mumh <- rep(NA, nfam)
        mumh[mumindex] <- mum1[mumindex]
        mumh[!mumindex] <- mum2[!mumindex]

        sib2 <- cbind(dadh, mumh)


        sibs1[,i] <- rowSums(abs(sib1) - 1)
        sibs2[,i] <- rowSums(abs(sib2) - 1)
        dads[,i] <- dad1 - 1 + abs(dad2) - 1
        mums[,i] <- mum1 - 1 + abs(mum2) - 1

    }
    return(list(dads=dads, mums=mums, sibs1=sibs1, sibs2=sibs2))
}

makePhen <- function(effs, indep, vy=1, vx=rep(1, length(effs)), my=0) {
    if(is.null(dim(indep))) indep <- cbind(indep)
    stopifnot(ncol(indep) == length(effs))
    stopifnot(length(vx) == length(effs))
    cors <- effs * vx / sqrt(vx) / sqrt(vy)
    stopifnot(sum(cors^2) <= 1)
    cors <- c(cors, sqrt(1-sum(cors^2)))
    indep <- t(t(scale(cbind(indep, rnorm(nrow(indep))))) * cors * c(vx, 1))
    y <- drop(scale(rowSums(indep)) * sqrt(vy)) + my
    return(y)
}

make_phenotypes <- function(fam, eff_gx, vx, mx) {
    lapply(fam, function(g)
    {
        x <- makePhen(c(eff_gx), cbind(g), vy=vx, my=mx)
        return(data.frame(x=x))
    })
}

Generate 100000 families with 3 SNPs each with MAF 0.5

a <- make_families(c(0.5, 0.5, 0.5), 100000)

Generate phenotypes where those three SNPs explain most of the variance

b <- make_phenotypes(a, sqrt(c(0.3, 0.3, 0.3)), 1, 0)
str(b)
List of 4
 $ dads :'data.frame':  100000 obs. of  1 variable:
  ..$ x: num [1:100000] 0.0211 -1.2521 -0.9917 0.1754 -0.9217 ...
  .. ..- attr(*, "scaled:center")= num -6.26e-17
  .. ..- attr(*, "scaled:scale")= num 1
 $ mums :'data.frame':  100000 obs. of  1 variable:
  ..$ x: num [1:100000] 0.174 0.899 1.341 -0.173 -0.135 ...
  .. ..- attr(*, "scaled:center")= num 4.89e-17
  .. ..- attr(*, "scaled:scale")= num 1
 $ sibs1:'data.frame':  100000 obs. of  1 variable:
  ..$ x: num [1:100000] 0.509 0.192 1.35 -1.571 -0.51 ...
  .. ..- attr(*, "scaled:center")= num -2.44e-17
  .. ..- attr(*, "scaled:scale")= num 1
 $ sibs2:'data.frame':  100000 obs. of  1 variable:
  ..$ x: num [1:100000] -0.368 -0.632 0.176 0.416 -0.44 ...
  .. ..- attr(*, "scaled:center")= num -2.02e-17
  .. ..- attr(*, "scaled:scale")= num 0.994

Select spouse pairs with most similar phenotypes

ssd <- (b$dads$x - b$mums$x)^2
sel <- which(ssd < quantile(ssd, 0.1))

Calculate LD between SNPs

  1. Spouse pairs - expect positive correlation
cor(a$dads[,1], a$mums[,1])
[1] 0.002640232
cor(a$dads[sel,1], a$mums[sel,1])
[1] 0.185586

Positive correlation observed

  1. Within fathers only - expect negative correlation
cor(a$dads) %>% round(2)
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
cor(a$dads[sel,]) %>% round(2)
      [,1]  [,2]  [,3]
[1,]  1.00 -0.19 -0.18
[2,] -0.19  1.00 -0.19
[3,] -0.18 -0.19  1.00

Negative correlations observed

  1. Within siblings - expect positive correlation
cor(a$sibs1) %>% round(2)
     [,1] [,2] [,3]
[1,] 1.00    0 0.01
[2,] 0.00    1 0.00
[3,] 0.01    0 1.00
cor(a$sibs1[sel,]) %>% round(2)
      [,1]  [,2] [,3]
[1,]  1.00 -0.01 0.01
[2,] -0.01  1.00 0.00
[3,]  0.01  0.00 1.00

Not seeing any correlation


sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.1.4

loaded via a namespace (and not attached):
 [1] digest_0.6.34     utf8_1.2.4        R6_2.5.1          fastmap_1.1.1    
 [5] tidyselect_1.2.0  xfun_0.42         magrittr_2.0.3    glue_1.7.0       
 [9] tibble_3.2.1      knitr_1.45        pkgconfig_2.0.3   htmltools_0.5.7  
[13] rmarkdown_2.26    generics_0.1.3    lifecycle_1.0.4   cli_3.6.2        
[17] fansi_1.0.6       vctrs_0.6.5       compiler_4.3.3    tools_4.3.3      
[21] pillar_1.9.0      evaluate_0.23     yaml_2.3.8        rlang_1.1.3      
[25] jsonlite_1.8.8    htmlwidgets_1.6.3