The main way of submitting a job is to submit a jobscript with the squeue
command. The inputs files for the course that youo copied earlier include example jobscripts for you to modify. We will being by inspecting the jobscript hello.slurm
. First we need to move to the correct directory. If you followed the previous instructions the training materials will be in ~/scratch/balena-intro
. We can change to this directory and display the jobscript:
cd ~/scratch/
cat balena-intro/hello/hello.slurm
The jobscript consists of several parts, the first line tells the operating system that the file is a script.
This is followed by a series of lines beginning #SBATCH
, which are specific instructions to the Slurm scheduler. Some of these such as --job-name
are self-explanatory. Others you can hopefully interpret after earlier episodes e.g. --partition
. Some need further clarification:
--account=AAA-AAAXX
project account. If you pay for a premium account then you will have an account with a form like this. If you use the free account you should replace the code with free
. In order for you to run the training we provide you with an account so that you do not have to wait as long in the queue.
--reservation=training
even if you have a premium account you have higher priority but your jobs will not necessarily run immediately. We have also reserved part of Balena for the training so that there are specific nodes free to run your jobs. This would normally be removed from the jobscript, so after the lesson should be removed. If you keep it your job will try to find a reservation that won't exist as it will be removed once this training is finished.
The next section loads modules that enable us to run parallel jobs using the intel
libraries. Notice how we explcitly purge
current modules then add in just the modules that are required for the job to run.
The next section creates a small script. If you are not familar with linux scripts don't worry too much about this. It is a standard when writing a program in a new tool or with a new language that you write a variant on Hello world
to print hello to the screen. These lines of code essentially creates a simple parallel version of the standard Hello world
program.
The final line of the script executes a the script in parallel. You can submit the script, with:
sbatch balena-intro/hello/hello.slurm
Check that the job has run correctly by cat
ing the output files hello.out
and hello.err
.
Thus far we have run the job through the queue, however sometimes we want to check whether our job will or code are working correctly. To test this you have two options. Firstly if you just want to check a jobscript you can run up to 4 nodes for up to 15 minutes in the development partition by setting the partition to batch-devel
.
Alternatively we may want to run our code interactively for which we have a number of options. By default interactive jobs will be launched on the itd
nodes, which are a shared resource which may be used by multiple users at the same time:
sinteractive
which will launch you job instantly, or you can use the short form sint
. Alternatively you can run interactively by launching the interactive job with:
sinteractive --partition=[partition] <--account=[account id]> <--reservation=[reservation]>
where you should specify the partition, account and reservation if needed. Other parameters such as time and nodes can also be specified.
If you are running free jobs your jobs are limited to 24node hours (nodes * time) with maximum's of 32 nodes and 6 hours. Thus if you run at larger node count you can run for shorter times. If you need to run for longer, or larger node counts, or just want your jobs to have higher priority then you can pay for a premium account.
You can only run on interactive job, and one development job at a time.