SPAdes¶
SPAdes is an assembly toolkit containing various assembly pipelines.
SPAdes is available as a module on Apocrita.
Usage¶
To run the default installed version of SPAdes, simply load the spades
module:
$ module load spades
$ spades.py --help
SPAdes genome assembler <VERSION>
Usage: spades.py [options] -o <output_dir>
Basic options:
-o <output_dir> directory to store all the resulting files (required)
--isolate this flag is highly recommended for high-coverage isolate and multi-cell data
--sc this flag is required for MDA (single-cell) data
--meta this flag is required for metagenomic data
--bio this flag is required for biosyntheticSPAdes mode
--corona this flag is required for coronaSPAdes mode
--rna this flag is required for RNA-Seq data
--plasmid runs plasmidSPAdes pipeline for plasmid detection
--metaviral runs metaviralSPAdes pipeline for virus detection
--metaplasmid runs metaplasmidSPAdes pipeline for plasmid detection in metagenomic datasets (equivalent for --meta --plasmid)
--rnaviral this flag enables virus assembly module from RNA-Seq data
--iontorrent this flag is required for IonTorrent data
--test runs SPAdes on toy dataset
-h, --help prints this usage message
-v, --version prints version
Advanced options:
--dataset <filename> file with dataset description in YAML format
-t/--threads <int> number of threads [default: 16]
-m/--memory <int> RAM limit for SPAdes in Gb (terminates if exceeded) [default: 250]
--tmp-dir <dirname> directory for temporary files [default: <output_dir>/tmp]
-k <int,int..> comma-separated list of k-mer sizes (must be odd and less than 128) [default: 'auto']
--cov-cutoff <float> coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset <33 or 64> PHRED quality offset in the input reads (33 or 64) [default: auto-detect]
--custom-hmms <dirname> directory with custom hmms that replace default ones [default: None]
For full usage documentation, run spades.py --help
.
Example job¶
Selecting the number of threads and memory
By default, SPAdes will run multi-threaded on 16 cores and 250Gb (or all
available memory for nodes with less than 250Gb). To prevent overloading
a compute node, you should override this by passing the --threads
parameter with the value of ${NSLOTS}
and the --memory
parameter with
the value of ${SGE_HGR_m_mem_free%.*}
.
Serial job¶
Here is an example job running on 1 core and 1GB of memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
module load spades
spades.py -o <output_dir> \
-1 example1.fastq \
-2 example2.fastq \
--threads ${NSLOTS} \
--memory ${SGE_HGR_m_mem_free%.*}