SRA Tools¶
The Sequence Read Archive (SRA) Toolkit is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
SRA Tools is available as a module on Apocrita.
Usage¶
To run the default installed version of SRA Tools, simply load the
sra-tools
module:
$ module load sra-tools
Usage: <command> [options] [--help]
Setup of SRA Tools
When running SRA Tools for the first time it is necessary to run
vdb-config --interactive
and then press 'x' to save a basic
configuration. This is not necessary for further usage.
The following commands are available:
abi-dump align-info
fasterq-dump fastq-dump
illumina-dump ngs-pileip
prefetch sam-dump
sff-dump sra-pileup
sra-stat var-expand
vdb-config vdb-decrypt
vdb-dump vdb-encrypt
vdb-validate
Usage for each command can be shown by running the command with the --help
option. For more detailed documentation, please check the link to the
documentation page listed in the references below.
Example jobs¶
Here are some example jobs using the main tools, each running on 1 core and 2G of memory:
Fastq-dump¶
This will convert SRA data into FASTQ format.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sra-tools
# Produces two (--split-files) fasta files (--fasta) with 60 bases
# per line ("60" included after --fasta).
fastq-dump --split-files --fasta 60 input-file
Prefetch¶
Allows command line downloading of SRA, dbGaP and ADSP data.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sra-tools
# Sets the maximum download file size to 200GB and downloads the
# files listed in the kart.
prefetch -X 200G input-file.krt
Sam-dump¶
Converts SRA data to SAM format.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sra-tools
# Produces gzip’d output file with a reconstructed header.
sam-dump -r --gzip --output-file output-file.sam.gz input-file
SRA-pileup¶
Generates pileup statistics on aligned SRA data.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sra-tools
# Produces pileup stats for a given genomic region (-r), chromosome 1,
# bases 559140-559160 (1:559140-559160).
sra-pileup -r 1:559140-559160 input-file
VDB-config¶
Displays and modifies VDB configuration information.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sra-tools
# Imports a dbGaP repository key by command line.
vdb-config --import input-file.ngc
VDB-decrypt¶
Decrypts non-SRA dbGaP data.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sra-tools
# Decrypt a single encrypted file that has been downloaded.
vdb-decrypt input-file