PyTorch¶

PyTorch is an open source deep learning platform.

Versions¶

CPU and GPU versions of the PyTorch python library are available and require different methods to install.

GPU version is recommended

PyTorch typically runs much faster on a GPU. Researchers need to request permission to be added to the list of GPU node users.

It's worth visiting the PyTorch "Get Started" page, where you can find an interactive installation command generator.

GPU version¶

Installing with pip¶

PyTorch may be installed using pip in a virtualenv, which uses packages from the Python Package Index. The PyTorch binaries are packaged with necessary libraries built-in, therefore it is not required to load CUDA/CUDNN modules.

Initial setup:

module load python
virtualenv pytorch_env
source pytorch_env/bin/activate
pip install torch torchvision torchaudio

Installing specific versions of PyTorch

To select a specific version, use the pip standard method, for example, to install version 1.0.0, run pip install torch==1.0.0. Removing the version number installs the latest release version.

If you have any other additional python package dependencies, these should be installed into your virtualenv with additional pip install commands, or preferably in bulk using a requirements file.

Subsequent activation as part of a GPU job:

module load python
source pytorch_env/bin/activate

Installing with Conda¶

Anaconda and Miniconda are no longer available on Apocrita due to licensing issues. Please use Miniforge instead.

PyTorch no longer has official Conda packages

There are no longer any officially supported PyTorch Conda packages. If you wish to use Conda with PyTorch, please create a Conda environment as detailed below before then going on to install PyTorch into the activated environment from PyPi using pip install.

If you prefer to use Conda environments, instructions are provided below. However, for simplicity the examples on this page will use pip.

Conda package availability and disk space

Conda tends to pull in a lot of packages, consuming more space than pip virtualenvs. Additionally, pip tends to have a wider range of third-party packages than Conda.

Note that there are no longer any official PyTorch Conda packages available. To use PyTorch in a Conda environment, you must create a Conda environment choosing a specific version of Python (check here for details of currently supported versions) at the time of creation.

Initial setup using Python 3.12:

module load miniforge
mamba create -n pytorch_env python=3.12
mamba activate pytorch_env

Then, follow the "Installing with pip" instructions above in the activated environment:

pip install torch torchvision torchaudio

For alternative installation options, refer to the PyTorch "Get Started" page.

Subsequent activation as part of a GPU job:

module load miniforge
mamba activate pytorch_env

CPU-only version¶

The CPU version will be slower, but perhaps useful for quick prototyping, and creates a much smaller virtual environment. CPU-only code should not be run on the GPU nodes.

To install the cpu-only version, create a virtualenv or Conda environment as shown in the GPU examples above, then run the following command:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Example jobs¶

GPU basic example¶

The job script assumes a virtual environment pytorch_env containing the pytorch GPU packages, set up as shown above.

#!/bin/bash
#$ -cwd
#$ -pe smp 8
#$ -l h_vmem=11G
#$ -l h_rt=240:0:0
#$ -l gpu=1

module load python
source ~/pytorch_env/bin/activate
python two_layer_net_tensor_gpu.py

A copy of the example PyTorch script can be obtained by running

wget https://raw.githubusercontent.com/sbutcher/pytorch-examples/master/tensor/two_layer_net_tensor_gpu.py

Submit the script to the job scheduler.

GPU training example¶

This example makes use of the PyTorch transfer learning tutorial which utilises a single GPU. The following steps will set up the environment to use with an existing virtual environment named pytorch_env, with PyTorch and matplotlib packages installed:

wget https://pytorch.org/tutorials/_downloads/07d5af1ef41e43c07f848afaf5a1c3cc/transfer_learning_tutorial.py
wget https://download.pytorch.org/tutorial/hymenoptera_data.zip
mkdir data
unzip hymenoptera_data.zip -d data

Create a job script using this GPU job template and submit with the qsub command:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8
#$ -l h_vmem=11G
#$ -l h_rt=240:0:0
#$ -l gpu=1

module load python
source ~/pytorch_env/bin/activate
python transfer_learning_tutorial.py

Checking that the GPU is being used correctly

Running ssh <nodename> nvidia-smi on a node will query the GPU status. You can also use the nvtools module to check that the GPU is being used correctly. If the job is running, the qstat command will show which node is being used.

It is possible to write PyTorch code for multiple GPUs, and also hybrid CPU/GPU tasks, but do not request more than one GPU unless you can verify that multiple GPU are correctly utilised by your code.

CPU-only example¶

The job script assumes a virtual environment pytorchcpu containing the cpu-only pytorch packages, set up as shown above.

#!/bin/bash
#$ -cwd
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G

module load python
source ~/pytorchcpu/bin/activate
python two_layer_net_tensor_cpu.py

A copy of the example PyTorch script can be obtained by running

wget https://raw.githubusercontent.com/sbutcher/pytorch-examples/master/tensor/two_layer_net_tensor_cpu.py

Submit the script to the job scheduler.

PyTorch¶

Versions¶

GPU version¶

Installing with pip¶

Installing with Conda¶

CPU-only version¶

Example jobs¶

GPU basic example¶

GPU training example¶

CPU-only example¶

References¶