Ensembl-VEP¶
VEP determines the effect of your variants (insertions, deletions and structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Conda installation¶
VEP can be installed from the Bioconda Anaconda channel by loading the Miniforge module and then creating a Conda environment and installing VEP into it (output below truncated):
$ module load miniforge
$ mamba create --quiet --yes --name vep_env
$ mamba activate vep_env
(vep_env) $ mamba install bioconda::ensembl-vep
Looking for: ['bioconda::ensembl-vep']
...
Updating specs:
- bioconda::ensembl-vep
...
Confirm changes: [Y/n] Y
...
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction:
This package installs only the variant effect predictor (VEP) library
code. To install data libraries, you can use the 'vep_install' command
installed along with it. For example, to install the VEP library for human
GRCh38 to a directory
vep_install -a cf -s homo_sapiens -y GRCh38 -c /output/path/to/GRCh38/vep --CONVERT
(note that vep_install is renamed from INSTALL.pl
to avoid having generic script names in the PATH)
The --CONVERT flag is not required but improves lookup speeds during
runs. See the VEP documentation for more details
http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html
done
Usage¶
To run VEP, simply load the miniforge
module and activate the Conda
environment you installed it into:
$ module load miniforge
$ mamba activate vep_env
(vep_env) $ vep --help
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#
Versions:
ensembl : XYZ
ensembl-compara : XYZ
ensembl-funcgen : XYZ
ensembl-io : XYZ
ensembl-variation : XYZ
ensembl-vep : XYZ
...
For full option documentation see here.
Database Configuration¶
VEP Database Configuration on Apocrita
There are no databases or cache files for Ensembl-VEP available on Apocrita. The required data for each run must be downloaded in advance.
You must use the vep_install
command as detailed in the
installation instructions above to install data libraries.
Specify a data library location
You should specify a location for your data libraries using the -c
flag as
below. The default location is $HOME/.vep/
and you may fill up your home
directory if you fail to specify another location such as your
scratch or
Research Group storage space.
e.g.:
$ vep_install -a cf -s homo_sapiens -y GRCh38 -c /output/path/to/GRCh38/vep --CONVERT
- getting list of available cache files
- downloading https://ftp.ensembl.org/pub/release-XYZ/variation/indexed_vep_cache/homo_sapiens_vep_XYZ_GRCh38.tar.gz
- unpacking homo_sapiens_vep_XYZ_GRCh38.tar.gz
- converting cache, this may take some time but will allow VEP to look up variants and frequency data much faster
- use CTRL-C to cancel if you do not wish to convert this cache now (you may run convert_cache.pl later)
<DATE> <TIME> - Processing homo_sapiens
<DATE> <TIME> - Processing version XYZ_GRCh38
<DATE> <TIME> - No unprocessed types remaining, skipping
<DATE> <TIME> - All done!
- downloading Homo_sapiens.GRCh38.dna.toplevel.fa.gz
- downloading Homo_sapiens.GRCh38.dna.toplevel.fa.gz.fai
- downloading Homo_sapiens.GRCh38.dna.toplevel.fa.gz.gzi
The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use
"--fasta /output/path/to/GRCh38/vep/homo_sapiens/113_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz"
All done
Replace /output/path/to/
with a path of your choice. Note, this path must
already exist otherwise the vep_install
command will fail.
To enable offline mode and use of the cache, pass the --offline
and
--cache
flags.
Example job¶
Serial job¶
Specify your cache directory
You must specify the chosen cache directory as part of your job script
using the --dir
option, otherwise it won't be found.
Here is an example job running on 1 core and 1GB of memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
module load miniforge
mamba activate vep_env
vep -i homo_sapiens_GRCh38.vcf \
--cache \
--dir /output/path/to/GRCh38/vep \
--offline \
--output_file results