Compiling C, C++ and Fortran code¶
On Apocrita we provide a number of compilers and interpreters for popular programming languages. You can use these to build and run your own project code. On Apocrita we also provide programs and software components for you to use, but you can also use the compiler tools to build these for yourself.
This page focuses on the C, C++ and Fortran languages which are the most common compiled languages in use on the cluster. Other documentation pages exist for: Java, Julia, Python, R and Ruby.
Available compilers¶
A number of compiler suites, each offering C, C++ and Fortran compilers, are available on Apocrita:
- GCC
- Intel (part of Intel Parallel Studio XE)
- NVIDIA HPC SDK
Within a compiler suite the provided C compiler is a companion processor to the Fortran compiler in the sense of C interoperability.
Modules are available to set up the user environment giving access to these compilers. One version of the GCC compilers will be available without loading a module although this is typically a much earlier version than offered through the module system. Usually it is preferable to load the module for the latest release of each compiler.
GCC will usually give reliable results. However, depending on your code and libraries, the Intel or NVIDIA compilers may provide considerable performance improvements.
Compilation should be performed as job submissions or
interactively via qlogin
in order not to impact
the frontend nodes for other users. It is usually advisable to compile code on
the same architecture machines as it will
be run on, so the appropriate node selection should
be applied to these job requests.
In general it makes sense to stick to the same compiler for the whole project. For C/C++ it should be possible to use different compilers for code which is then linked together but with Fortran code this is less easy.
We also provide older releases of the PGI Community Edition compiler suite. These are no longer licensed for compiling but remain available to allow previously compiled code to run. Modules for the PGI compilers are in our deprecated modules collection. The NVIDIA compiler suite is a drop-in replacement for the PGI suite.
Loading a compiler module¶
It is generally a good idea to be specific with your compiler version. Check
which modules you have loaded to be sure you have the right compiler and that
there are no conflicts. The available compiler versions can be viewed in the
devtools
section of the output of the module avail
command.
Check the available versions for the GCC compiler suite:
$ module avail gcc
gcc/4.8.5 gcc/6.3.0 gcc/7.1.0(default) gcc/8.2.0 gcc/10.2.0
Intel compiler version 2017.3 can be loaded with the command
module load intel/2017.3
You can test this by typing the command:
icc -V
This should return a short message reporting the compiler version:
Intel(R) C Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.3.191 Build 20170404
Copyright (C) 1985-2017 Intel Corporation. All rights reserved.
Often, you will require other libraries and headers that can be found in other modules. Unlike modules which provide many programs and tools, these library modules may have versions specific to a particular compiler suite. For example, for Open MPI:
$ module avail openmpi
openmpi/2.1.0-gcc openmpi/3.0.0-gcc(default)
openmpi/2.1.1-intel openmpi/3.1.2-gcc
Modules for use with a single compiler suite have an indicating suffix, such
as the -gcc
and -intel
seen here. For example, to use Open MPI with the
Intel compiler suite we would load modules as:
module load intel/2017.3 openmpi/2.1.1-intel
Again, check your loaded modules with
module list
If you don't specify a particular version, the version marked as default
in
the output of module avail
command will be loaded.
Using the compilers¶
Each of the compiler suites provides a C, C++ and a Fortran compiler. The name of the compiler command varies with the language and the compiler suite. For convenience the compiler suite modules set consistent environment variables by which the compilers may be referenced. The compiler names and variables are given in the following table:
Language | Variable | GCC | Intel | NVIDIA |
---|---|---|---|---|
C | CC | gcc | icc | nvcc |
C++ | CXX | g++ | icpc | nvc++ |
Fortran | FC | gfortran | ifort | nvfortran |
Compilation for specific nodes¶
Different processors support different instruction sets which may provide a performance boost. During compilation the target instruction sets can be selected via CPU architecture flags. The required flags vary by compiler suite and are detailed below.
Supported Instruction Sets
Some instruction sets are not supported on all nodes, so you may need to compile different binaries for each node type.
Please see the table here for details on supported instruction sets.
Checking available CPU flags on a node¶
The CPU flags, including details of supported instruction sets, are listed
in the file /proc/cpuinfo
on each Apocrita node. This file has a line
labelled flags
.
For example, checking for all available CPU flags and supported instruction sets on an sdx node:
[sdx1.apocrita ~]$ grep flags /proc/cpuinfo | uniq
Checking for AVX2 instruction set support on an sdx node:
[sdx1.apocrita ~]$ awk '/flags.*avx2/ {print "AVX2 supported";exit}' /proc/cpuinfo
AVX2 supported
Checking for AVX-512F instruction set support on an smf node:
[smf1.apocrita ~]$ awk '/flags.*avx512f[$ ]/ {print "AVX-512F supported";exit}' /proc/cpuinfo
AVX-512F supported
All of the Apocrita compute nodes now support AVX2 and AVX512 at minimum, meaning that application crashes due to incompatible instruction sets are likely to be much rarer than in the past. Additionally, if you previously compiled your own code on an older node for maximum compatibility, you may wish to recompile on a newer node to gain any potential advantages from the newer architecture. If you have any questions regarding this, then it is worth getting in touch with the Research Software Engineers.
Selecting instruction sets during compilation¶
The instruction sets which may be targeted by the compiler can be selected
using a compile-time flag. When using the GCC compilers (gcc
, g++
or
gfortran
) the flag -march=<cpu_flag>
targets the instruction set of the
given CPU type. When using this option the compiler may generate code which
will not run on other CPU types. Notably, the option -march=native
targets
the instruction set of the CPU type running the compiler. This targeting
may be disabled using the option -mno-<cpu_flag>
when compiling code.
To see what the GCC compiler will do with the -march=native
option you can
use:
gcc -march=native -Q --help=target
Alternatively, the option -mtune=<cpu_flag>
asks the compiler to tune the
produced code to the given CPU type without restricting the instruction set
to that of CPUs of that type.
The Intel compilers, as well as having -march
(albeit with different
semantics), has a flag -xHost
which requests targeting of the highest
instruction set available on the CPU type on which the compiler runs.
Processor incompatibilities with targeted code
Code produced with these options should provide a performance boost, but it is important to note that code optimised for a certain architecture may not run on other nodes, due to AMD/Intel differences, or lack of a certain feature in older processors.
You will need to build the code on the same type of node you will be
executing on (via qsub
or qlogin
sessions) to use the relevant processor
optimisation.
The NVIDIA compilers do not offer a -march
option. Instead, the option
-tp=<target>
is to be used.
More information on the compiler specific architecture flags is available in the vendor documentation:
Using GPU nodes with OpenMP¶
On Apocrita we support offloading to GPU devices using
OpenMP with the GCC. If you have access to the GPU nodes you can compile and
run appropriate OpenMP programs, such as those using the target
construct,
as described below.
OpenMP device offload with GCC compilers¶
To use OpenMP target offload on Apocrita with GCC, you will need to use version
10.2 or later. Offloading should then be automatically enabled when OpenMP
compilation is selected with the -fopenmp
compiler option. For example to
compile the source file offload-example.c
which uses the target
construct,
you can use:
module load gcc/10.2.0
gcc -fopenmp offload-example.c
The option -foffload=-lm
is required to support the maths library on the
target device. If you see an error message like
unresolved symbol sqrtf
collect2: error: ld returned 1 exit status
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
compilation terminated.
then you will need to provide this option when compiling.
Although it is not necessary to compile the code on a GPU node to enable GPU offload, using the node on which you wish to run is advised when compiling.
An OpenMP program compiled with offload enabled can be run in the same way
as with other programs. Offload happens automatically if a GPU is available
when a target
construct is entered.
To disable offload so that the code with a target
construct is run on the CPU
host instead of the GPU device, compile the program with -foffload=disable
.
Equally, the code can be compiled without the -fopenmp
option if OpenMP
is not required.
libgomp
loader warnings on non-GPU nodes
If you run an OpenMP program with offload target regions on a node without a GPU you may see a warning like:
libgomp: while loading libgomp-plugin-nvptx.so.1: libcuda.so.1: cannot open shared object file: No such file or directory
These warnings occur because we provide a single compiler build to work on
all node types. Compiling programs with -foffload=disable
will not
avoid such warnings. However, affected parallel regions will still run on
the host CPU and the warnings can be safely ignored.
Build systems¶
Typically, software for Linux comes with a build system with one of two flavours: GNU Autotools and CMake. Each of these typically uses the Make tool at a lower level.
On Apocrita the GNU Autotools system can be used without loading a module,
although it may be necessary to load an
autotools-archive
module to support some
additional macros. To use CMake it is necessary to load a cmake
module.
For a project using GNU Autotools the general steps to build are as follows:
./configure [options]
make
First one runs a configuration command which creates a Makefile. One then runs
the make
command that reads the Makefile and calls the necessary compilers,
linkers and such.
CMake is similar but as well as supporting Makefiles can also configure the
build system using Visual Studio Projects, OSX XCode Projects and more.
Such projects can be identified by the presence of a CMakeList.txt
file.
GNU Autotools and CMake support out-of-tree source builds. Put another way, one can create a binary and all its associated support files in a directory that is not the same as the one with the source files. This can be quite advantageous when working with a source management tool like Git or SVN or when building the project supporting several different configurations, such as for debugging or targeting different node types.
To work with CMake with an out-of-tree build, start with creating a build directory in a different location:
$ pwd
/data/home/abc123/MySourceCode
$ mkdir ../MySourceCode_build
$ cd ../MySourceCode_build
$ cmake ../MySourceCode
Essentially, you enter the build directory and call cmake
with the path to
your CMakeList.txt
file. If you wish to re-configure your build, you can use
the program ccmake
.
The end result is a Makefile. So to complete your build you type:
make
just as you would with the GNU Autotools setup.
Similarly, to use an out-of-tree build with GNU Autotools:
$ pwd
/data/home/abc123/MySourceCode
$ mkdir ../MySourceCode_build
$ cd ../MySourceCode_build
$ ../MySourceCode/configure
To learn more about GNU Autotools, CMake, and Makefiles follow the links below
- GNU Autotools - FAQ
- GNU Make - Writing Makefiles
- Makefile Wikipedia article
- CMake Webpage
- CMake Wikipedia article
Optional libraries for HPC¶
MPI¶
The Message Passing Interface is a protocol for parallel computation often used in HPC applications. On Apocrita we have the distinct implementations Intel MPI and Open MPI available. In general we recommend the use of Intel MPI where suitable.
The module system allows the user to select the implementation of MPI to be used, and the version. With Open MPI, as noted above, one must be careful to load a module compatible with the compiler suite being used.
To load the default (usually latest) Intel MPI module:
module load intelmpi
To set up the Open MPI environment, version 3.0.0, suitable for use with the GCC compiler suite:
module load openmpi/3.0.0-gcc
For each implementation, several versions may be available. The default version is usually set to the latest release: an explicit version number is required to load a different version.
Default module for Open MPI
The Open MPI modules have a default loaded following the command
module load openmpi
which is openmpi/3.0.0-gcc
. This default
module is specific to the GCC compiler suite and so to access an
MPI implementation compatible with a different compiler suite a
specific module name must be specified.
To build a program using MPI it is necessary for the compiler and linker to be able to find the header and library files. As a convenience, the MPI environment provides wrapper scripts to the compiler, each of which sets the appropriate flags for the compiler. The name of each wrapper script depends on the implementation and the target compiler.
Open MPI¶
For each Open MPI module, and the implementation provided by the NVIDIA compiler suite module, the wrapper scripts are consistently named for each language. These are given in the table below:
Language | Script |
---|---|
C | mpicc |
C++ | mpic++ |
Fortran | mpif90 |
As an example, a Fortran MPI program may be compiled as
module load openmpi/3.0.0-gcc
mpif90 -o hello hello.f90
rather than requiring the addition of numerous include and linker path flags:
gfortran -o hello hello.f90 -I... -L... -l...
The Open MPI wrapper scripts provide an option -show
which details the
final invocation of the compiler:
$ module load openmpi/3.0.0-gcc
$ mpif90 -show -o hello hello.f90
gfortran -o hello hello.f90 ...
No Open MPI module is provided for use with the NVIDIA compiler suite. Instead, the installed NVIDIA compiler environment provides an Open MPI implementation and the NVIDIA compiler module contains the appropriate settings:
$ module purge; module load nvidia-hpc-sdk/21.3
$ type mpif90
mpif90 is /share/apps/centos7/nvidia-hpc-sdk/2021_213/Linux_x86_64/21.3/comm_libs/mpi/bin/mpif90
Intel MPI¶
In contrast, the Intel MPI implementation supports both the Intel and GCC compiler suites in the same module. As with Open MPI wrapper scripts are provided, but these wrapper script names depend on the target compiler suite as well as the language. The wrapper script names are as in the following table:
Language | Compiler suite | Script |
---|---|---|
C | GCC | mpicc |
C | Intel | mpiicc |
C++ | GCC | mpic++ |
C++ | Intel | mpiicpc |
Fortran | GCC | mpif90 |
Fortran | Intel | mpiifort |
The scripts can be used as in the Open MPI example above:
$ module load intelmpi
$ mpif90 -show -o hello hello.f90
gfortran -o 'hello' 'hello.f90' ...
$ mpiifort -show -o hello hello.f90
ifort -o 'hello' 'hello.f90' ...
Matching versions of Intel MPI and Intel compiler
In general we recommend that, when using Intel MPI with the Intel
compilers, you match the versions of the modules. For example, if using
module load intel/2017.3
then you should also use
module load intelmpi/2017.3
. However, there are times where it is
necessary or desirable to use a different version of Intel MPI. In these
cases you should load the Intel MPI module after loading the compiler
module: module load intel/2017.3 intelmpi/2018.3
.
There is no support for the NVIDIA compilers in the Intel MPI implementation.
Compiling and testing¶
If make succeeds, you should see various calls being printed on your screen with the name of the compiler you chose. If compilation completed successfully you should see a success message of some kind, and an executable appear in your source or build directory.
Quite often, software comes with test programs you can also build. Often, the command to do this looks like the following:
make test
Optimisation¶
Software optimisation comes in many forms, such as compiler optimisation, using alternate libraries, removing bottlenecks from code, algorithmic improvements, and using parallelisation. Using processor-specific compiler options may reduce universal compatibility of your compiled code, but could yield substantial improvements.
The Intel, NVIDIA and GCC compilers may give different performance depending on different libraries or processor optimisation. Benchmarking and comparing code compiled with each compiler is recommended.
Profiling tools¶
Once you have a running program that has been tested, there are several tools you can use to check the performance of your code. Some of these you can use on the cluster and some you can use on your own desktop machine.
perf¶
perf
is a tool that creates a log of where your program spends its time.
The report can be used as a guide to see where you need to focus your time
when optimising code. Once the program has been compiled, it should be run
through the record
subcommand of perf:
perf record -a -g my_program
where my_program
is the name of the program to be profiled. Once the
program run a log file is generated. This log file may be analysed with
the report
subcommand of perf. For example, to display the function
calls in order of the most called:
perf report --sort comm,dso
More information on perf can be found at this Profiling how-to and this extensive tutorial
valgrind¶
valgrind is a suite of tools that allow you to improve the speed and reduce the memory usage of your programs. An example command would be:
valgrind --tool=memcheck <myprogram>
Valgrind is well suited for multi-threaded applications, but may not be suitable for longer running applications due to the slowdown incurred by the profiled application. In addition, there is a graphical tool which is not offered on the cluster but will work on Linux desktops. There is also an extensive manual.
Python profiling tools¶
The above tools work best for compiled binaries. If you are writing code
in Python, cProfile
and line_profiler
are useful options.
Optimizations for slow-running Python code include parallelisation with
multiprocessing
or
dask
to use multiple cores efficiently, and compilers
such as pythran
or numba
.
For more details, High Performance Python
by Micha Gorelick and Ian Ozsvald
is available to QMUL staff and students.