Liftover gwasvcf

Author

Gibran Hemani

Published

September 25, 2024

Background

Need an efficient way to liftover gwas-vcf files from hg19 to hg38. First tried picard:

~/bin/java/jdk-23/bin/java -jar ~/bin/picard.jar CreateSequenceDictionary -R hg38.fa.gz -O hg38.dict

java -jar picard.jar 
      R=reference.fasta \ 
      O=reference.dict

~/bin/java/jdk-23/bin/java -jar ~/bin/picard.jar LiftoverVcf \
     -I ieu-a-2.vcf.gz \
     -O lifted_over.vcf \
     -CHAIN hg19ToHg38.over.chain.gz \
     -REJECT rejected_variants.vcf \
     -R hg38.fa.gz

This left no lifted variants - probably because the chromosome names weren’t matching. bcftools +liftover option seems to be much more reliable

# Install bcftools http://samtools.github.io/bcftools/howtos/install.html
# Download plugin binaries https://software.broadinstitute.org/software/score/
export BCFTOOLS_PLUGINS=/path/to/bcftools-plugins && bcftools +liftover
# Download hg19.fa
wget https://genvisis.umn.edu/rsrc/Genome/hg19/hg19.fa
# Download hg38.fa
wget https://genvisis.umn.edu/rsrc/Genome/hg38/hg38.fa
# Download chain file
wget https://hgdownload.soe.ucsc.edu/gbdb/hg19/liftOver/hg19ToHg38.over.chain.gz

# Example gwas-vcf file
wget https://gwas.mrcieu.ac.uk/files/ieu-a-2/ieu-a-2.vcf.gz

# Example liftover
bcftools +liftover --no-version -Ou ieu-a-2.vcf.gz -- \
  -s hs37d5.fa \
  -f hg38.fa \
  -c hg19ToHg38.over.chain.gz \
  --reject ieu-a-2-reject.vcf.gz \
  --reject-type z |
  bcftools sort -Oz -o ieu-a-2-hg38.vcf.gz -W=tbi



export BCFTOOLS_PLUGINS=~/bin/bcftools-plugins
bcftools

bcftools +liftover --no-version -Ou ieu-a-2.vcf.gz -- \
  -s hs37d5.fa \
  -f hg38.fa \
  -c hg19ToHg38.over.chain.gz \
  --reject reject.vcf.gz \
  --reject-type z |
  bcftools sort -Oz -o ieu-a-2_hg38.vcf.gz -W=tbi

It did 2m variants in about 1 minute.


sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.4.1    fastmap_1.2.0     cli_3.6.2        
 [5] tools_4.4.1       htmltools_0.5.8.1 yaml_2.3.8        rmarkdown_2.27   
 [9] knitr_1.47        jsonlite_1.8.8    xfun_0.44         digest_0.6.35    
[13] rlang_1.1.3       evaluate_0.23