MACS Major Upgrade to RHEL8

The GW4 Isambard Multi-Architecture Comparison System will be unavailable for the week of the 19th April to perform planned upgrades & maintenance of the software stack.

This is a major software upgrade to Red Hat Enterprise Linux 8, bringing the Operating System major version inline with the A64FX service, it will provide a better base for software development and improve the MACS compatibility with scientific software, including the Cray software stack.

Some user software compatibility issues are to be expected due to changed/updated libraries, so recompilations may be required to continue running on MACS.

XCI & A64FX remain available during this time.

Isambard A64fx hackathon, March 23-24 - "Full steam ahead!"

A two-day hackathon to focus on porting and optimising HPC codes to the latest Arm-based processor: Fujitsu’s A64fx

Tue, 23 Mar 2021, 09:30 – Wed, 24 Mar 2021, 17:00 GMT

https://www.eventbrite.co.uk/e/the-3rd-isambard-hackathon-full-steam-ahead-tickets-143718930189

Hackthon Description

The third in the Isambard hackathon series will take place online via Zoom, and focus on porting and optimising codes to the processor behind Fugaku, the fastest supercomputer in the world - the Arm-based A64fx from Fujitsu. Isambard now includes a rack of 72 A64fx processors, along with Cray, Arm and GNU compilers.

The hackathon will begin with a training session from Arm and Cray/HPE on the architecture and software tools, before moving into the hands-on part of the hackathon. All attendees will be given accounts on the Isambard A64fx Apollo 80 system for the duration of the event. Accounts may be available to continue porting activities post the event, upon request. For more details about the system, see:

https://gw4-isambard.github.io/docs/

Schedule (all times GMT)

Tuesday March 23rd:

09:30 - 09:40: Welcome and introductions (Prof. Simon McIntosh-Smith, Isambard PI)

09:40-11:30: An introduction to the A64fx architecture, including SVE and software tools (Phil Ridley, Arm)

11:30-11:45: Break

11:45-12:45: The A64fx software environment on the HPE Apollo 80 (John Levesque, Cray/HPE)

12:45-14:00: Lunch

14:00-17:00: Hands-on hackathon, supported by Isambard GW4, Arm and Cray/HPE staff

Wednesday March 24th:

09:30-11:00: Review of Tuesday’s session and hackathon continuation

11:00-11:30: Break

11:30-13:00: Hackathon cont.

13:00-14:00: Lunch

14:00-17:00: Hackathon cont.

17:00: Hackathon wrap-up and next steps.

Zoom meeting details

These will be emailed to you as part of your tickets.

Previous Isambard Hackathons

Isambard has run two previous hackathons which were extremely successful. Run in October 2017 and March 2018, they were some of the very first public events porting HPC codes to production Arm hardware. This third Isambard hackathon is one of the very first such public events targetting the A64fx, and the first in Europe.

isambard-hackathon-1+2-photo.jpg

The Isambard 2 A64fx Apollo 80 system includes 72 A64fx CPUs, and Infiniband interconnect, and the Cray/HPE, Arm and GNU compilers for Arm. Hackathon attendees will be given training accounts on the system for the hackathon, and may be permitted to retain access after the hackathon to continue porting and optimisation activities.

Isambard is a UK National Tier-2 service, funded by EPSRC (EP/P020224/1). Isambard is run by the GW4 Alliance of the universities of Bristol, Bath, Cardiff and Exeter, along with the UK’s Met Office.

Previous results from Isambard were published at the Cray User Group workshops in 2018 and 2019, winning the best paper award in 2019:

McIntosh‐Smith, Simon, James Price, Tom Deakin, and Andrei Poenaru. “A performance analysis of the first generation of HPC‐optimized Arm processors.” Concurrency and Computation: Practice and Experience 31, no. 16 (2019): e5110.

https://doi.org/10.1002/cpe.5110

McIntosh‐Smith, Simon, James Price, Andrei Poenaru, and Tom Deakin. “Benchmarking the first generation of production quality Arm‐based supercomputers.” Concurrency and Computation: Practice and Experience 32, no. 20 (2020): e5569.

https://doi.org/10.1002/cpe.5569

Cray Compiler Environment (CCE) 9.0.0 installed

Cray CCE 9.0.0 has been installed on XCI, feel free to test it out by loading the cdt/19.06 module!

This is a major revision to CCE with the compilers being based on LLVM.

Documentation can be found here: https://pubs.cray.com/content/S-5212/9.0/cray-compiling-environment-cce-release-overview/cce-900-release-overview-introduction

XCI: Huge page bug fixed

Cray has deployed the first monthly patchset on XCI which has included a fix for the Out-Of-Memory errors which some jobs using Huge Pages have experienced.

Isambard XC50 software & chip updates

Next week (w/c 11th March), we will be shutting down XCI to upgrade both the hardware and system software. This process will likely take the full week, and so we expect to resume service from Monday 18th March.

These upgrades will move us to Cray Linux Environment 7.0, which bumps the underlying Operating System from SLES 12 to SLES 15. It is expected that this may cause existing dynamically linked executables to fail to run, so we recommend recompiling any such programs that you may have once the upgrade is completed. Note that after the upgrade, the minimum available Cray Developer Toolkit will be CDT 19.03, and so you may encounter some issues when recompiling your codes.

Updates

14 March: The upgrade is proceeding smoothly, XCI is running it’s new software stack on fresh chips!

We are working now to ensure the service is stable, configured correctly for user access and any hardware bugs are caught early.

XCI: System update

Cray CDT/18.12 installed as module cdt/18.12

Cray CDT/18.11 remains available

Arm Compiler version 19 installed as module PrgEnv-allinea

Arm Compiler version 18.4.2 is also available

GCC 8.2.0 installed as module gcc/8.2.0

GCC 7.3.0 & GCC 6.1.0 are also available.

All of the new modules have been set as the default versions, which means they will be loaded if you omit the version number from the module name.

New documentation!

New Isambard user documentation source! => https://gw4-isambard.github.io/docs/

If you want to contribute we’re happy to review pull requests, or just email changes your local GW4 SysAdmin!

XC50 Approaches...

The single cabinet consists of approx 164 compute nodes of 64 cores each, for a total of 10'496 cores of Cavium Thunder X2 ARMv8, backed by the same Aries interconnect. A 0.5 Petabyte Lustre filesystem is dedicated to the Isambard system.

Discussions are underway on acceptance tests, we expect to run HPL (LINPACK), HPCG, STREAM, MPI & I/O benchmarks. Some practical codes will also be run for comparison against the numbers produced on the Early Access nodes ( http://www.goingarm.com/slides/2017/SC17/GoingArm_SC17_Bristol_Isambard.pdf ), including UM/NEMO, a chemistry and an engineering code.

The HPC group at Bristol Uni has recently put out a paper on these numbers in more depth: https://uob-hpc.github.io/assets/cug-2018.pdf

Isambard Hackathon #2 "Stoking The Fire"

The second Isambard hackathon “Stoking The Fire” ended yesterday after a day and a half of hacking on the second half of the top 10 most common Archer codes + UM/NEMO. There were specialists for each code in attendance, Cray compiler developers and Cavium engineers.

I was really pleased to see the system handling the workload without incident and surprised that this time not a single node crashed! That alone makes the week it took to upgrade the ARM chips worth it!

There were bugs in some codes being run, there were even a couple of compiler and performance tool bugs discovered and raised with Cray; At this early stage every bug found is a step towards getting the platform closer to operational quality!

I spent my time helping users, applying hotfixes, investigating enhancement requests and eating the sandwiches.

The following codes were run successfully:

  • CASTEP
  • OpenFOAM
  • SBLI
  • UM
  • NEMO
  • Molpro
  • NAMD
  • Hydro3D
  • Bookleaf

2018-stoking-the-fire-isambard.jpg