A64FX - Fujitsu

All documentation for the A64fx Isambard hackathon on March 23/24 2021 can be found here.

HPE Apollo 80 cabinet, codenamed A64FX. Available since February 2021.

  • Fujitsu A64FX Processors @ 1.8GHz, 72 nodes
    • 48 ARMv8.2 cores with 512-bit SVE

    • 32 GB HBM2 memory arranged in 4 core memory groups (CMGs) with 12 cores and 8 GB each

    • 64 KB private L1 cache, 8 MB shared L2 cache per CMG

    • Mellanox Infiniband

  • Red Hat Enterprise Linux 8
    • Cray Programming Environment

  • Dedicated Lustre filesystem for job scratch directories: /scratch

  • Shared Lustre filesystem with XCI & MACS: /home, /projects, /software

Running Jobs

The system uses PBS Pro and the queue for the A64FX nodes is a64fx. It can only be used from the A64FX login nodes.

A job is requested as follows:

qsub -q a64fx -lselect=N:ncpus=48,place=scatter ...

…where N is the number of nodes required.

Modules

By default, the Cray programming environment is loaded. A64FX-specific modules are exposed from /lustre/software/aarch64/modulefiles.

The Bristol HPC group also maintains a shared modules space where you may find additional useful tools, but keep in mind that these may not always be up-to-date. To use it: module use /lustre/projects/bristol/modules-a64fx/modulefiles.

Note that this system resets modules when starting a job. Make sure that your job scripts load all the modules needed to run your application.

Compilers

All major HPC compilers targetting AArch64 are available through modules:

  • Arm Compiler for Linux:
    • tools/arm-compiler-a64fx/21.0

    • tools/arm-compiler-a64fx/20.3

  • Cray Compilation Environment:
    • Classic frontend, targetting SVE: cce-sve/10.0.3

    • Classic Clang frontend, no SVE support: cce/10.0.3, new Clang with better C++ support in cce/14.0.1

  • Fujitsu Compiler, optimised for A64FX:
    • fujitsu-compiler/4.3.1

  • GNU Compiler Collection:
    • gcc/11-*: snapshots of the development version (experimental, but with support for A64FX)

    • gcc/10.2.0

    • gcc/8.1.0

MPI

The default MPI library is CRAY MVAPICH, available though cray-mvapich2_noslurm_nogpu. It can be used with the GCC and Cray Compilers.

The Bristol modules space has builds of Open MPI with UCX:

  • openmpi/4.1.0/gcc-11.0 (also works with GCC 10.2)

  • openmpi/4.1.0/arm-21.0

  • openmpi/4.1.0/arm-20.3

There are older builds of Open MPI without UCX, but these can only be used for single-node jobs:

  • openmpi/4.0.4/gcc-11.0 (also works with GCC 10.2)

  • openmpi/4.0.4/arm-20.3

HDF5

Current versions of Cray HDF5 Parallel (cray-hdf5-parallel/1.12.0.2) require the following modules and environment variables in order to work.

> module load cray-mvapich2_noslurm_nogpu
> module load cray-hdf5-parallel
> export PE_HDF5_PARALLEL_REQUIRED_PRODUCTS=PE_MVAPICH2
> export PKG_CONFIG_PATH=/projects/bristol/hackathon/fix/cray-hdf5-parallel-mvapich:${PKG_CONFIG_PATH}

> cc HDF5_File_create.c -o HDF5_File_create
> ./HDF5_File_create
Warning: Process to core binding is enabled and OMP_NUM_THREADS is greater than one (48).
If your program has OpenMP sections, this can cause over-subscription of cores and consequently poor performance
To avoid this, please re-run your application after setting MV2_ENABLE_AFFINITY=0
Use MV2_USE_THREAD_WARNING=0 to suppress this message

Fixes are planned for later versions of Cray PE.

Documentation

Fujitsu’s website includes an A64FX Datasheet and Microarchitecture Manual. The architecture manual can also be found on GitHub.