A64FX - Fujitsu

All documentation for the A64fx Isambard hackathon on March 23/24 2021 can be found here.

HPE Apollo 80 cabinet, codenamed A64FX. Available since February 2021.

  • Fujitsu A64FX Processors @ 1.8GHz, 72 nodes
    • 48 ARMv8.2 cores with 512-bit SVE

    • 32 GB HBM2 memory arranged in 4 core memory groups (CMGs) with 12 cores and 8 GB each

    • 64 KB private L1 cache, 8 MB shared L2 cache per CMG

    • Mellanox Infiniband

  • Red Hat Enterprise Linux 8
    • Cray Programming Environment

  • Dedicated Lustre filesystem for job scratch directories: /scratch

  • Shared Lustre filesystem with XCI & MACS: /home, /projects, /software

Running Jobs

The system uses PBS Pro and the queue for the A64FX nodes is a64fx. It can only be used from the A64FX login nodes.

A job is requested as follows:

qsub -q a64fx -lselect=N:ncpus=48,place=scatter ...

…where N is the number of nodes required.


By default, the Cray programming environment is loaded. A64FX-specific modules are exposed from /lustre/software/aarch64/modulefiles.

The Bristol HPC group also maintains a shared modules space where you may find additional useful tools, but keep in mind that these may not always be up-to-date. To use it: module use /lustre/projects/bristol/modules-a64fx/modulefiles.

Note that this system resets modules when starting a job. Make sure that your job scripts load all the modules needed to run your application.


All major HPC compilers targetting AArch64 are available through modules:

  • Arm Compiler for Linux:
    • tools/arm-compiler-a64fx/21.0

    • tools/arm-compiler-a64fx/20.3

  • Cray Compilation Environment:
    • Classic frontend, targetting SVE: cce-sve/10.0.3

    • Classic Clang frontend, no SVE support: cce/10.0.3, new Clang with better C++ support in cce/14.0.1

  • Fujitsu Compiler, optimised for A64FX:
    • fujitsu-compiler/4.3.1

  • GNU Compiler Collection:
    • gcc/11-*: snapshots of the development version (experimental, but with support for A64FX)

    • gcc/10.2.0

    • gcc/8.1.0


The default MPI library is CRAY MVAPICH, available though cray-mvapich2_noslurm_nogpu. It can be used with the GCC and Cray Compilers.

The Bristol modules space has builds of Open MPI with UCX:

  • openmpi/4.1.0/gcc-11.0 (also works with GCC 10.2)

  • openmpi/4.1.0/arm-21.0

  • openmpi/4.1.0/arm-20.3

There are older builds of Open MPI without UCX, but these can only be used for single-node jobs:

  • openmpi/4.0.4/gcc-11.0 (also works with GCC 10.2)

  • openmpi/4.0.4/arm-20.3


Current versions of Cray HDF5 Parallel (cray-hdf5-parallel/ require the following modules and environment variables in order to work.

> module load cray-mvapich2_noslurm_nogpu
> module load cray-hdf5-parallel
> export PKG_CONFIG_PATH=/projects/bristol/hackathon/fix/cray-hdf5-parallel-mvapich:${PKG_CONFIG_PATH}

> cc HDF5_File_create.c -o HDF5_File_create
> ./HDF5_File_create
Warning: Process to core binding is enabled and OMP_NUM_THREADS is greater than one (48).
If your program has OpenMP sections, this can cause over-subscription of cores and consequently poor performance
To avoid this, please re-run your application after setting MV2_ENABLE_AFFINITY=0
Use MV2_USE_THREAD_WARNING=0 to suppress this message

Fixes are planned for later versions of Cray PE.


Fujitsu’s website includes an A64FX Datasheet and Microarchitecture Manual. The architecture manual can also be found on GitHub.