Skip to content

Canu

A single molecule sequence assembler for large and small genomes.

Usage

To use the default version of canu:

$ module load canu
$ canu

usage:   canu [-version] [-citation] \
              [-haplotype | -correct | -trim | -assemble | -trim-assemble] \
              [-s <assembly-specifications-file>] \
               -p <assembly-prefix> \
               -d <assembly-directory> \
               genomeSize=<number>[g|m|k] \
              [other-options] \
              [-haplotype{NAME} illumina.fastq.gz] \
              [-corrected] \
              [-trimmed] \
              [-pacbio |
               -nanopore |
               -pacbio-hifi] file1 file2 ...

For full usage documentation, run canu --help.

Error message - Gatekeeper detected problems in your input reads

To resolve this error message, supplement each dataset with Illumina reads. Renaming the files in <assembly-directory>/correction or using the stopOnReadQuality=false option will produce an undesirable assembly.

When executing a canu command just like the one above, an Apocrita submission script is created and submitted. This script is available under: <assembly-directory>/canu-scripts/canu.N.sh where N is an incremented number used internally by canu to distinguish between Apocrita submissions.

This script contains the canu parameters passed via the first command and is submitted as a new job on Apocrita to start the canu sequence assembler. The output of this second script is written to: <assembly-directory>/canu-scripts/canu.N.out.

Example job

canu is unusual compared to other applications on Apocrita. Typically we expect users to prepare submissions scripts and use the qsub command to submit them.

canu creates the script and submits it automatically.

Serial job

Load the canu module using the preExec argument

canu doesn't automatically load any modules in any job scripts it generates, even for itself, so you must add preExec='module load canu' to your canu submit command to ensure the canu module is loaded along with all required dependencies such as java, perl and gnuplot.

Here are some examples of canu commands which will submit an Apocrita job running with 4 cores and 8GB of memory per core (default is 1 core and 4GB memory per core), loading the canu module using the preExec= argument:

$ canu \
    gridEngineResourceOption='-l h_vmem=8G -pe smp 4' \
    -p 'ecoli' \
    -d 'ecoli-pacbio' \
    genomeSize=4.8m \
    -pacbio pacbio.fastq \
    preExec='module load canu'
$ canu \
    gridEngineResourceOption='-l h_vmem=8G -pe smp 4' \
    -p ecoli \
    -d ecoli-oxford \
    genomeSize=4.8m \
    maxInputCoverage=100 \
    -nanopore ecolk12mg1655_R10_3_guppy_345_HAC.fastq \
    preExec='module load canu'

Example input files can be found in the Canu Quick Start guide.

When you run this command it will generate a script that should look something like this:

    cd /data/home/abc/canu/ecoli-pacbio
    qsub \
      -l h_vmem=8G \
      -pe smp 4  \
      -cwd \
      -N 'canu_ecoli' \
      -j y \
      -o canu-scripts/canu.01.out  canu-scripts/canu.01.sh
Your job 1234567 ("canu_ecoli") has been submitted

The qsub command will automatically be executed, so a new Apocrita job will be launched. The output of the second job is written to: /data/home/abc/canu/ecoli-pacbio/canu-scripts/canu.01.out which is symlinked in the parent directory as canu.out.

References