RSEM¶
RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data.
Conda installation¶
RSEM can be installed from the Bioconda Anaconda channel by loading the Miniforge module and then creating a Conda environment and installing RSEM into it (output below truncated):
$ module load miniforge
$ mamba create --quiet --yes --name rsem_env
$ mamba activate rsem_env
(rsem_env) $ mamba install bioconda::rsem
Looking for: ['bioconda::rsem']
...
Updating specs:
- bioconda::rsem
...
Confirm changes: [Y/n] Y
...
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
To take advantage of RSEM's built-in support for the Bowtie/Bowtie 2/HISAT2/STAR alignment program, you must install the relevant alignment package(s) in the same Conda environment as RSEM. The following command shows how to install all the listed alignment packages into the same environment as RSEM:
(rsem_env) $ mamba install -c bioconda bowtie2 hisat2 star
For compatibility, we recommend to not load any of these packages via modules.
Usage¶
To run RSEM, simply load the miniforge
module and activate the Conda
environment you installed it into:
module load miniforge
mamba activate rsem_env
After activating the Conda environment, the following commands are available:
convert-sam-for-rsem rsem-plot-model
extract-transcript-to-gene-map-from-trinity rsem-plot-transcript-wiggles
rsem-bam2readdepth rsem-prepare-reference
rsem-bam2wig rsem-preref
rsem-build-read-index rsem-refseq-extract-primary-assembly
rsem-calculate-credibility-intervals rsem-run-em
rsem-calculate-expression rsem-run-gibbs
rsem-extract-reference-transcripts rsem-sam-validator
rsem-generate-data-matrix rsem-scan-for-paired-end-reads
rsem-gen-transcript-plots rsem-simulate-reads
rsem-get-unique rsem-synthesis-reference-transcripts
rsem-gff3-to-gtf rsem-tbam2gbam
rsem-parse-alignments
Usage for each command can be shown by using the command with the --help
option. For more detailed documentation please check the link to the Github
page listed below.
Core Usage
Not all RSEM commands support multi-threading; Commands which do support
multi-threading should include a -p ${NSLOTS}
or
--num-threads ${NSLOTS}
switch to use the correct number of cores.
Please check the official documentation before running jobs.
Example job¶
Serial job¶
Here is an example job running on 2 cores and 4GB of memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 2
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load miniforge
mamba activate rsem_env
# Convert a transcript coordinate BAM alignments file into a genomic
# coordinate BAM alignments file
rsem-tbam2gbam reference_name \
unsorted_transcript_bam_input \
genome_bam_output \
-p ${NSLOTS}