Andrena cluster¶
The Andrena cluster is a set of GPU nodes which were purchased with a Research Capital Investment Fund to support the University's Digital Environment Research Institute.
Hardware¶
The cluster comprises 16 GPU nodes - each with 4 GPUs, providing a total of 64 Nvidia A100 GPUs. The Andrena nodes are joined to Apocrita and make use of the same job scheduler and high performance networking/storage.
DERI research groups may additionally make use of a portion of the 50TB DERI storage entitlement, while commonly used read-only datasets (e.g. training datasets for machine learning) can be hosted on high performance SSD storage.
Requesting access¶
To request access to the Andrena computational resources or storage, please contact us to discuss requirements.
Logging in to Andrena¶
The connection procedure is the same as Apocrita's login procedure. Please refer to the documentation below for information about how to submit jobs specifically to the Andrena cluster nodes.
Running jobs on Andrena¶
Workloads are submitted using the job scheduler and work exactly the same way as Apocrita, which is documented thoroughly on this site. If you have been approved to use Andrena, jobs can be submitted using the following additional request in the resource request section of the job script:
#$ -l cluster=andrena
Without this setting, the scheduler will try to run the job either on Apocrita or Andrena nodes, depending on availability.
Requesting 12 cores per GPU
By default, Andrena will only accept jobs requesting 8 cores per GPU. To
request 12 cores per GPU, ensure the -l cluster=andrena
parameter is
requested, otherwise your job will be rejected upon submission.
Submitted jobs should follow a similar template to Apocrita
GPU jobs, if requesting 8 cores per GPU (default). If
using the -l cluster=andrena
option to request 12 cores per GPU, the memory
request must be reduced to 7.5G RAM per GPU. otherwise your job will be
rejected upon submission.
An example GPU job script using a conda environment might look like:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 12 # 12 cores per GPU
#$ -l h_rt=240:0:0 # 240 hours runtime
#$ -l h_vmem=7.5G # 7.5G RAM per core
#$ -l gpu=1 # request 1 GPU
#$ -l cluster=andrena # use the Andrena nodes and enable 12 cores per GPU
module load miniforge
mamba activate tensorflow_env
python train.py
A typical GPU job script using virtualenv
will look similar. Some applications
such as PyTorch are packaged with
necessary GPU libraries built-in, therefore it is not required to load any
additional modules for GPU support.
However, CUDA libraries are not always installed as part of a pip install, so
it may be necessary to load the relevant cudnn
module to make the CUDNN and
CUDA libraries available in your virtual environment. Note that loading the
cudnn
module also loads a compatible cuda
module.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 12 # 12 cores per GPU
#$ -l h_rt=240:0:0 # 240 hours runtime
#$ -l h_vmem=7.5G # 7.5G RAM per core
#$ -l gpu=1 # request 1 GPU
#$ -l cluster=andrena # use the Andrena nodes and enable 12 cores per GPU
module load python cudnn
source venv/bin/activate
python train.py