Canu¶
A single molecule sequence assembler for large and small genomes.
Usage¶
To use the default version of canu
:
$ module load canu
$ canu
usage: canu [-version] [-citation] \
[-haplotype | -correct | -trim | -assemble | -trim-assemble] \
[-s <assembly-specifications-file>] \
-p <assembly-prefix> \
-d <assembly-directory> \
genomeSize=<number>[g|m|k] \
[other-options] \
[-haplotype{NAME} illumina.fastq.gz] \
[-corrected] \
[-trimmed] \
[-pacbio |
-nanopore |
-pacbio-hifi] file1 file2 ...
For full usage documentation, run canu --help
.
Error message - Gatekeeper detected problems in your input reads
To resolve this error message, supplement each dataset with
Illumina reads. Renaming the files in <assembly-directory>/correction
or using the stopOnReadQuality=false
option will produce an
undesirable assembly.
When executing a canu
command just like the one above, an Apocrita submission
script is created and submitted. This script is available under:
<assembly-directory>/canu-scripts/canu.N.sh
where N
is an incremented
number used internally by canu
to distinguish between Apocrita submissions.
This script contains the canu
parameters passed via the first command
and is submitted as a new job on Apocrita to start the canu sequence
assembler. The output of this second script is written to:
<assembly-directory>/canu-scripts/canu.N.out
.
Example job¶
canu
is unusual compared to other applications on Apocrita.
Typically we expect users to prepare submissions scripts and use the
qsub
command to submit them.
canu
creates the script and submits it automatically.
Serial job¶
Load the canu
module using the preExec
argument
canu
doesn't automatically load any modules in any job scripts it
generates, even for itself, so you must add preExec='module load canu'
to your canu
submit command to ensure the canu
module is loaded along
with all required dependencies such as java
, perl
and gnuplot
.
Here are some examples of canu
commands which will submit an Apocrita job
running with 4 cores and 8GB of memory per core (default is 1 core and 4GB
memory per core), loading the canu
module using the preExec=
argument:
$ canu \
gridEngineResourceOption='-l h_vmem=8G -pe smp 4' \
-p 'ecoli' \
-d 'ecoli-pacbio' \
genomeSize=4.8m \
-pacbio pacbio.fastq \
preExec='module load canu'
$ canu \
gridEngineResourceOption='-l h_vmem=8G -pe smp 4' \
-p ecoli \
-d ecoli-oxford \
genomeSize=4.8m \
maxInputCoverage=100 \
-nanopore ecolk12mg1655_R10_3_guppy_345_HAC.fastq \
preExec='module load canu'
Example input files can be found in the Canu Quick Start guide.
When you run this command it will generate a script that should look something like this:
cd /data/home/abc/canu/ecoli-pacbio
qsub \
-l h_vmem=8G \
-pe smp 4 \
-cwd \
-N 'canu_ecoli' \
-j y \
-o canu-scripts/canu.01.out canu-scripts/canu.01.sh
Your job 1234567 ("canu_ecoli") has been submitted
The qsub
command will automatically be executed, so a new Apocrita job will
be launched. The output of the second job is written to:
/data/home/abc/canu/ecoli-pacbio/canu-scripts/canu.01.out
which is symlinked
in the parent directory as canu.out
.