Stacks¶
Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
Conda installation¶
Stacks can be installed from the Bioconda Anaconda channel by loading the Miniforge module and then creating a Conda environment and installing Stacks into it (output below truncated):
$ module load miniforge
$ mamba create --quiet --yes --name stacks_env
$ mamba activate stacks_env
(stacks_env) $ mamba install bioconda::stacks
Looking for: ['bioconda::stacks]
...
Updating specs:
- bioconda::stacks
...
Confirm changes: [Y/n] Y
...
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Usage¶
To run Stacks, simply load the miniforge
module and activate the Conda
environment you installed it into:
module load miniforge
mamba activate stacks_env
After activating the Conda environment, the following commands are available:
process_radtags Examines raw reads from an Illumina sequencing run and
first, checks that the barcode and the RAD cutsite are
intact, and demultiplexes the data.
process_shortreads Performs the same task as process_radtags for fast
cleaning of randomly sheared genomic or transcriptomic
data, not for RAD data.
clone_filter Designed to identify PCR clones.
kmer_filter Allows paired or single-end reads to be filtered according
to the number or rare or abundant kmers they contain.
ustacks Takes as input a set of short-read sequences and aligns
them into exactly-matching stacks (or putative alleles).
cstacks Builds a catalog from any set of samples processed by the
ustacks or pstacks programs.
sstacks Sets of stacks, i.e. putative loci, constructed by the
ustacks program can be searched against a catalog produced
by cstacks.
tsv2bam Transpose data so that it is oriented by locus, instead
of by sample.
gstacks Examines a RAD data set one locus at a time, looking at
all individuals in the metapopulation for that locus.
populations Analyze a population of individual samples computing a
number of population genetics statistics as well as
exporting a variety of standard output formats.
The following scripts are included in the Stacks package and allows preset pipelines to be run:
denovo_map.pl
ref_map.pl
For usage documentation, run <command> -h
, and see the extensive online
documentation.
Example job¶
Serial job¶
The simplest way to execute the entire Stacks pipeline is to run it via the
denovo_map.pl
program.
Here is an example job running on 2 cores and 8GB of memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 2
#$ -l h_rt=1:0:0
#$ -l h_vmem=4G
module load miniforge
mamba activate stacks_env
denovo_map.pl -T ${NSLOTS} -M 4 -n 4 -o ./stacks/ \
--samples ./samples --popmap ./popmaps/popmap