MaSuRCA¶
MaSuRCA is an assembly algorithm for both PacBio and Illumina data that combines the benefits of De Bruijn graph and Overlap-Layout-Consensus assembly approaches.
MaSuRCA is available as a module on Apocrita.
Usage¶
To run the default installed version of MaSuRCA, simply load the masurca
module:
$ module load masurca
$ masurca -h
Options:
-t, --threads ONLY to use with -i option, number of threads
-i, --illumina Run assembly without creating configuration file,
argument can be illumina_paired_end_forward_reads or
illumina_paired_end_forward_reads,
illumina_paired_end_reverse_reads.
Illumina read file names must be comma-separated,
without a space in the middle.
Illumina read files must be fastq, with valid quality
values, can be gzipped.
-r, --reads ONLY to use with -i option, single long reads file for
hybrid assembly, can be Nanopore or PacBio, fasta or
fastq, can be gzipped
-v, --version Report version
-o, --output Assembly script (assemble.sh)
-g, --generate Generate example configuration file
-p, --path Prepend to PATH in assembly script
-l, --ld-library-path Prepend to LD_LIBRARY_PATH in assembly script
--skip-checking Skip checking availability of other executables
-h, --help This message
For usage documentation, run masurca --help
.
Example job¶
Selecting the number of threads
To prevent overloading a compute node, you should include the
NUM_THREADS=X
parameter in your configuration file, where X
is equal
to the number of cores requested.
Serial job¶
Here is an example job running on 2 cores and 4GB of memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 2
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load masurca
masurca example.cfg
./assemble.sh
Here is the supporting example.cfg
file:
DATA
PE= pe 180 20 /path/to/example.fastq
END
PARAMETERS
GRAPH_KMER_SIZE=auto
NUM_THREADS=2
JF_SIZE=200000000
END