Running jobs
Isambard uses the PBS Pro scheduler to manage compute resources and run jobs according to “fairshare” rather than fixed allocations.
The MACS, XCI & A64FX use separate schedulers, jobs must be submitted from the login nodes for the relevant system.
Limits
Users can submit any number of jobs but only two jobs per-user per-queue will run at the same time.
All jobs default to 24 hours walltime, you can set this lower in your job script to increase the chance of your job starting.
Queue configuration
arm - Run on XCI Marvell Thunder X2 nodes
arm-dev - Run interactively on up to 4x XCI Thunder X2 nodes
romeq - Run on MACS 4x AMD Epyc “Rome” 7742 nodes
clxq - Run on MACS 4x Intel Xeon 6230 “Cascade Lake” (CLX) nodes
voltaq - Run on MACS 4x Nvidia Tesla V100 “Volta” GPU nodes
pascalq - Run on MACS 4x dual-card Nvidia Tesla P100 “Pascal” GPU nodes
knlq - Run on MACS 8x Intel Xeon Phi “Knights Landing” 7210 CPU nodes
powerq - Run on MACS 2x IBM Power 9 nodes, each with dual-card Nvidia V100 “Volta” GPUs ←
Queue unavailable, interactive use only, hosts: power-001, power-002
knlq is split into two sets of MCDRAM configuration, nodes 001-004 are in cache memory mode (quad_0) and nodes 005-008 are in flat memory mode (quad_100). These modes can be targeted using the aoe=
PBS attribute.
To see the available queues and their current state:
qstat -q
Batch job
MACS MPI example:
#!/bin/bash
#PBS -q pascalq
#PBS -l select=2
#PBS -l walltime=00:01:00
module load intel-parallel-studio-xe/compilers/64
mpirun hostname
XC50 MPI example:
#!/bin/bash
#PBS -q arm
#PBS -l select=2
#PBS -l walltime=00:01:00
aprun -n 32 hostname
XC50 Multi-threaded (e.g. OpenMP) example.
#!/bin/bash
#PBS -q arm
#PBS -l select=2
#PBS -l walltime=00:01:00
aprun -d 32 hostname
Such a script saved as filename.pbs
file can be submitted to queue using:
qsub filename.pbs
Interactive job
Passing the -I
flag to qsub
allows a compute node to be used interactively.
For example, to request an interactive job on one of the Pascal nodes utilizing 1 GPU and 16 of the 36 available Broadwell CPU cores, use the following command:
qsub -I -q pascalq -l select=1:ncpus=16:ngpus=1
For XCI, compilations can be run on the login nodes xcil00
& xcil01
. Small development jobs can be run in the interactive queue arm-dev
.
Specifying resources
To avoid blocking resources which aren’t being used by your job, it is important to specify the correct amount of resources in your job script.
For example, this command declares that your job will run on a single node and will use one of the two available GPUs. The omission of the ncpus attribute causes it to default to 1, meaning other jobs can enter the system to use any of the remaining 35 Broadwell CPU cores and the unused GPU.
qsub -I -q pascalq -l select=1:ngpus=1
If you request ngpus=2, then any subsequently submitted job requesting a GPU will not run on the same node until a node is freed. Similarly setting ncpus=36 will block any jobs from running.
Usage History
You can see limited amount of job history by using the -x
flag on qstat
, for example
qstat -x -u $USER
qstat -x -f <JOBID>
Isambard job statistics are not currently available in SAFE.