Skip to content

SRA Tools

The Sequence Read Archive (SRA) Toolkit is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.

SRA Tools is available as a module on Apocrita.

Usage

To run the default installed version of SRA Tools, simply load the sra-tools module:

$ module load sra-tools
Usage:   <command> [options] [--help]

Setup of SRA Tools

When running SRA Tools for the first time it is necessary to run vdb-config --interactive and then press 'x' to save a basic configuration. This is not necessary for further usage.

The following commands are available:

abi-dump        align-info
fasterq-dump    fastq-dump
illumina-dump   ngs-pileip
prefetch        sam-dump
sff-dump        sra-pileup
sra-stat        var-expand
vdb-config      vdb-decrypt
vdb-dump        vdb-encrypt
vdb-validate

Usage for each command can be shown by running the command with the --help option. For more detailed documentation, please check the link to the documentation page listed in the references below.

Example jobs

Here are some example jobs using the main tools, each running on 1 core and 2G of memory:

Fastq-dump

This will convert SRA data into FASTQ format.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sra-tools

# Produces two (--split-files) fasta files (--fasta) with 60 bases
# per line ("60" included after --fasta).
fastq-dump --split-files --fasta 60 input-file

Prefetch

Allows command line downloading of SRA, dbGaP and ADSP data.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sra-tools

# Sets the maximum download file size to 200GB and downloads the
# files listed in the kart.
prefetch -X 200G input-file.krt

Sam-dump

Converts SRA data to SAM format.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sra-tools

# Produces gzip’d output file with a reconstructed header.
sam-dump -r --gzip --output-file output-file.sam.gz input-file

SRA-pileup

Generates pileup statistics on aligned SRA data.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sra-tools

# Produces pileup stats for a given genomic region (-r), chromosome 1,
# bases 559140-559160 (1:559140-559160).
sra-pileup -r 1:559140-559160 input-file

VDB-config

Displays and modifies VDB configuration information.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sra-tools

# Imports a dbGaP repository key by command line.
vdb-config --import input-file.ngc

VDB-decrypt

Decrypts non-SRA dbGaP data.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sra-tools

# Decrypt a single encrypted file that has been downloaded.
vdb-decrypt input-file

References