Ensembl-VEP¶

VEP determines the effect of your variants (insertions, deletions and structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Conda installation¶

VEP can be installed from the Bioconda Anaconda channel by loading the Miniforge module and then creating a Conda environment and installing VEP into it (output below truncated):

$ module load miniforge
$ mamba create --quiet --yes --name vep_env
$ mamba activate vep_env
(vep_env) $ mamba install bioconda::ensembl-vep

Looking for: ['bioconda::ensembl-vep']
...
  Updating specs:

   - bioconda::ensembl-vep
...
Confirm changes: [Y/n] Y
...
Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction:
This package installs only the variant effect predictor (VEP) library
code. To install data libraries, you can use the 'vep_install' command
installed along with it. For example, to install the VEP library for human
GRCh38 to a directory

vep_install -a cf -s homo_sapiens -y GRCh38 -c /output/path/to/GRCh38/vep --CONVERT

(note that vep_install is renamed from INSTALL.pl
 to avoid having generic script names in the PATH)

The --CONVERT flag is not required but improves lookup speeds during
runs. See the VEP documentation for more details

http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html

done

Usage¶

To run VEP, simply load the miniforge module and activate the Conda environment you installed it into:

$ module load miniforge
$ mamba activate vep_env
(vep_env) $ vep --help
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#

Versions:
  ensembl              : XYZ
  ensembl-compara      : XYZ
  ensembl-funcgen      : XYZ
  ensembl-io           : XYZ
  ensembl-variation    : XYZ
  ensembl-vep          : XYZ
...

For full option documentation see here.

Database Configuration¶

VEP Database Configuration on Apocrita

There are no databases or cache files for Ensembl-VEP available on Apocrita. The required data for each run must be downloaded in advance.

You must use the vep_install command as detailed in the installation instructions above to install data libraries.

Specify a data library location

You should specify a location for your data libraries using the -c flag as below. The default location is $HOME/.vep/ and you may fill up your home directory if you fail to specify another location such as your scratch or Research Group storage space.

e.g.:

$ vep_install -a cf -s homo_sapiens -y GRCh38 -c /output/path/to/GRCh38/vep --CONVERT
 - getting list of available cache files
 - downloading https://ftp.ensembl.org/pub/release-XYZ/variation/indexed_vep_cache/homo_sapiens_vep_XYZ_GRCh38.tar.gz
 - unpacking homo_sapiens_vep_XYZ_GRCh38.tar.gz
 - converting cache, this may take some time but will allow VEP to look up variants and frequency data much faster
 - use CTRL-C to cancel if you do not wish to convert this cache now (you may run convert_cache.pl later)
<DATE> <TIME> - Processing homo_sapiens
<DATE> <TIME> - Processing version XYZ_GRCh38
<DATE> <TIME> - No unprocessed types remaining, skipping
<DATE> <TIME> - All done!
 - downloading Homo_sapiens.GRCh38.dna.toplevel.fa.gz
 - downloading Homo_sapiens.GRCh38.dna.toplevel.fa.gz.fai
 - downloading Homo_sapiens.GRCh38.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use
"--fasta /output/path/to/GRCh38/vep/homo_sapiens/113_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz"


All done

Replace /output/path/to/ with a path of your choice. Note, this path must already exist otherwise the vep_install command will fail.

To enable offline mode and use of the cache, pass the --offline and --cache flags.

Example job¶

Serial job¶

Specify your cache directory

You must specify the chosen cache directory as part of your job script using the --dir option, otherwise it won't be found.

Here is an example job running on 1 core and 1GB of memory:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G

module load miniforge
mamba activate vep_env

vep -i homo_sapiens_GRCh38.vcf \
    --cache \
    --dir /output/path/to/GRCh38/vep \
    --offline \
    --output_file results

Ensembl-VEP¶

Conda installation¶

Usage¶

Database Configuration¶

Example job¶

Serial job¶

References¶