DCSC logo
 
ABOUT-DCSC
DCSC/SDU
DCSC/AU
DCSC/AAU
DCSC/DTU
DCSC/KU
 
+Open all         -Close all
 
    Overview   Hardware   Software   Batchjobs   Hints  

 

Batchjobs on Grendel

All jobs on Grendel must be executed as batchjobs through the queueing system. On Gendel we are using Torque, which formerly was known as PBS or OpenPBS.
For helping Torque prioritizing the jobs, the MAUI scheduler is used.
All jobs must be submitted with an apropriate resourcerequest (see below).
A node cannot be shared by more jobs, therefore all jobs must be carefully prepareted to utillize the resources on the requested nodes efficiently.
The frontend must not be used for any jobs, except testjobs running a couple of minuttes
Currently these queues are defined:

Queue Description Limits/remarks
q4 /
qdell
Jobs for the 4 core / 8GB Dell sc1435 machines 94 machines. Default walltime 740 hours
q8 /
qx2200
Jobs for the 8 core / 16GB SUN x2200 machines 288 machines. Default walltime 740 hours
This is the default queue.
qfat Jobs for the 8 core / 32GB / 2TB disk SUN x2200 machines 25 machines. Default walltime 740 hours
qexp High priority queue for 4 reserved 8 core / 16 GB SUN x2200 machines This queue can encompass more than 4 nodes if availbale.
Each user can have max 1 job running at a time.
Each job can max span 4 nodes and is limitted to 1 hour wallclock.
qgpu Jobs requiering the Nvidia Tesla GPU accelerators. 20 nodes, each w. two GPUs. Default walltime 740 hours
Not open for all users, contact Staff.
q8n Jobs for the 8 Intel Nehalem core / 24GB HP Dl1000 nodes. 196 nodes. Default walltime 740 hours.

When a job starts, a uniq directory will be created in a local /scratch -filesystem on each node the job has allocated. You can refer to this directory as /scratch/$PBS_JOBID
When the job terminates the scratch-directory(ies) and its/their contents is automatically erased.

To benefit from the backfilling mechanism, all jobs should specify a realistic wallclock time. Backfilling help jobs to start earlier, the drawback is that if the specified wallclock time is too small, the job will abend. The wallclock time can be changed during jobexecution with qalter -l walltime=hh:mm:ss jobid. However, to avoid fooling the backfill mechanism, only the sysadmin can raise jobs wallclock time.

Usefull commands for handling batchjobs on Grendel:

Submit jobs to the system:
% qsub -q q8 jobscript Submits a job to the 8 core SUN x2200 machines

% qsub -q q8 -l nodes=N:ppn=M,walltime=hh:mm:ss jobscript Submits a job requiering N nodes, each with M CPUs
and running for hh:mm:ss wallclock time.

% qsub -q qexp -l nodes=N:ppn=M jobscript Submits a short testjob to the express queue, requiering N nodes, each with M CPUs

jobscript is the name of the job cammand file.
Delete a job:
% qdel jobid - where jobid is the uniq identifier of the job, which is showed in the js command output.
Display jobs in the queues:
% js - displays jobs in the queues
% bj [-u] - show the number of CPUSs allocated to groups (and users: -u)
% nodes - show the node and queue status.
% gnodes - show the node status graphically. The number of @'s indicate the load on the node. <!> means that the load is zero. See gnodes -h for more informations.
% je - display the jobs utillization of the nodes it has allocated.

Example of a jobscript for a serial batchjob:

#!/bin/csh
#PBS -l nodes=1:ppn=4
#PBS -l walltime=2:30:00
#PBS -q q4

echo "========= Job started  at `date` =========="

# copy inputdata and the executable to the scratch-directory
cd path/to/my/inputdata
cp *.dat /scratch/$PBS_JOBID
cp prog.exe /scratch/$PBS_JOBID

# change directory to the local scratch-directory, and run:
cd /scratch/$PBS_JOBID
./prog.exe > out

# copy home the outputdata:
cp out $PBS_O_WORKDIR/

echo "========= Job finished at `date` =========="
#
This job can be submitted with this command:
qsub jobscript
The #PBS -lines in the jobscript is an alternative way to pass arguments to the queueing system.

Example of a jobscript for parallel execution of more serial tasks


To achieve best node-utilisation an easy trick is to run more processes at the same time in a job. In principle this can be done by starting a number of processes in the background and putting a wait-statement at last. The wait statement is very important, if it is missing the job will just fork the processes and exit.
Example:
#!/bin/csh
#PBS -l nodes=1:ppn=4
#PBS -l walltime=2:30:00
#PBS -q q4

echo "========= Job started  at `date` =========="
cd /scratch/$PBS_JOBID
myprogram arg1 > outdata1 &
myprogram arg2 > outdata2 &
myprogram arg3 > outdata3 &
myprogram arg4 > outdata4 &
wait

cp outdata* $PBS_O_WORKDIR/

echo "========= Job finished at `date` =========="
#
The four processes are started in the background (&) and when all have finished, the outputdata files are copied back. The wait statement let the jobscript wait until all (child-) processes have finished.

Example of a jobscript for parallel execution of tasks on more nodes


The principple above can be extended so that the job uses more nodes. It also calls the mem and cpus commands (see "Hints") to get the amount of memory and the number of CPUcores in the nodes.
#!/bin/bash
echo "========= Job started  at `date` =========="
Mreq=1.5  # 'myprogram' requires Mreq GB.
N=0
for node in `sort -u < $PBS_NODEFILE`; do
   Mnode=`mem $node`   # Get amount of memory
   Cnode=`cpus $node`  # Get number of CPUcores
   instances=$(echo "scale=0; $Mnode / $Mreq" | bc)
   [ $instances -gt $Cnode ] && instances=$Cnode # Max Cnode cpus available
   [ $instances -lt 1 ] && continue # Node has not enough memory

   for i in $(seq $instances) ; do
     ((N++))
     rsh -n $node "cd path; ./myprogram < input.$N > output.$N" &
   done
done
wait

echo "========= Job finished at `date` =========="