Power of GWAS in ascertained case control datasets

Author

Gibran Hemani

Published

July 24, 2022

Case control studies ascertain a fixed number of cases and controls. This changes the distribution of genetic liability in the selected sample - e.g. if the prevalence is low then the liability will be a truncated distribution for cases ascertained for the tail of the distribution, and truncated for the controls ascertained for a depletion of values in the tail (e.g. see here for illustrations https://pubmed.ncbi.nlm.nih.gov/21376301/).

The more rare the disease, the larger the variance of the liability when cases and controls are matched. This should improve statistical power because the cases and controls are ascertained to be more genetically distinct from each other.

However, the Genetic Power Calculator concludes the opposite, as prevalence gets lower the power goes down (https://zzz.bwh.harvard.edu/gpc/cc2.html). e.g. for OR=1.1, ncase=1000, ncontrol=1000, af=0.5, for 80% power:

prev = 0.001, power = 4e-5

prev = 0.4, power = 0.71

Quick simulation to investigate:

library(simulateGP)library(dplyr)

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':
filter, lag

The following objects are masked from 'package:base':
intersect, setdiff, setequal, union

library(ggplot2)

Generate a function that will

create a population with some genetic liability

stochastically assign disease status based on heritability and prevalence

ascertain cases and controls

identify how many significant associations in the case/control sample

sims <-function(ncase, ncontrol, nsnp, prev, hsq=0.5, thresh=5e-8){# Determine minimum sample size required to ascertain required number of cases and controls n_req <-round(max(ncase / prev, ncontrol / prev) +10000)# Generate matrix of genotype values g <-make_geno(n_req, nsnp, 0.5)# Effect sizes for each SNP b <-rnorm(nsnp) dat <-tibble(id =1:n_req,l =scale(g %*% b), # genetic liabilityp =gx_to_gp(l, hsq, prev), # convert to disease probabilityd =rbinom(n_req, 1, p) # sample disease status from probability )# Ascertain cases and controls dat <-rbind(subset(dat, d ==0)[1:ncase,],subset(dat, d ==1)[1:ncontrol,] )# Perform GWAS res <-gwas(dat$d, g[dat$id,], logistic=TRUE)# Count number of significant assocsreturn(sum(res$pval < thresh))}