Skip to content

GATK

GATK is a collection of command-line tools for analysing high-throughput sequencing data with a primary focus on variant discovery.

GATK is available as a module on Apocrita.

Usage

To run the default installed version of GATK, simply load the gatk module:

$ module load gatk
$ gatk -h

Usage:   gatk <subcommand> [arguments]

For full usage documentation, run gatk -h.

Example job

Serial job

Here is an example job running on 1 core and 2GB of memory:

#!/bin/bash
#SBATCH --ntasks=1        # (or -n 1) Request 1 core
#SBATCH --mem-per-cpu=2G  # Request 2GB RAM per core (2GB total)
#SBATCH --time=1:0:0      # (or -t 1:0:0) Request 1 hour runtime

module load gatk

# Run HaplotypeCaller in default mode on a single input BAM file containing
# sequence data and outputs a VCF file containing variant calls.
gatk HaplotypeCaller -R reference.fasta -I sample1.bam -O variants.vcf

References