PBS
Special Queue: gpu
#!/bin/bash #PBS -q gpu #PBS -N cuda #PBS -l nodes=1:ppn=1 echo "Hello from $HOSTNAME: date = `date`" nvcc --version echo "Finished at `date`"
usage
qsub -I -l nodes=1:ppn=8,feature=gpu -l walltime=12:00:00
qsub -I run.pbs
qsub -I -l nodes=1:ppn=8,ngpus=2,nodetype=xl -l walltime=12:00:00 run.pbs1
Special case
your user who is running interactive jobs simply ask for exclusive access to his nodes:
qsub -I -lselect=2:ncpus=2 -lplace=excl
would request 2 chunks of 2 cpus. So PBS could fullfill that with one node or two. The addition of the "excl" says to not let the other cores on the selected node be used by a subsequent job.
qsub -I -lselect=2:ncpus=4
would allow you to select two whole nodes.
usage
qmgr -c ‘print queue workq’
qmgr -c 'list queue workq'
qmgr -c 'list queue'
qmgr -c "print queue @default"
qmgr -c "list node n001"
active queue workq
active queue workq,gpu
Example: To make all queues at the default server active:
Qmgr: active queue @default
to create a queue named Q1 at the active server:
Qmgr: create queue Q1
qmgr -c "list node n023 resources_available.host"
qmgr -c "list queue workq resources_assigned.mpiprocs"
Set node offline
set node state = “offline”
Qmgr: set node n002 state=offline
gpu
basic
Example of Configuring PBS for Basic GPU Scheduling In this example, there are two execution hosts, HostA and HostB, and each execution host has 4 GPU devices. 1. Stop the server and scheduler. On the server's host, type: /etc/init.d/pbs stop 2. Edit PBS_HOME/server_priv/resourcedef, and add the following line: ngpus type=long flag=nh 3. Edit PBS_HOME/sched_priv/sched_config to add ngpus to the list of scheduling resources: resources: “ncpus, mem, arch, host, vnode, ngpus” 4. Restart the server and scheduler. On the server's host, type: /etc/init.d/pbs start 5. Add the number of GPU devices available to each execution host in the cluster via qmgr: Qmgr: set node HostA resources_available.ngpus=4 Qmgr: set node HostB resources_available.ngpus=4
vi /var/spool/PBS/server_priv/resourcedef
vi /var/spool/PBS/sched_priv/sched_config
/etc/init.d/pbs stop
/etc/init.d/pbs start
qmgr
Max open servers: 49
Qmgr: set node n020 resources_available.ngpus=2
Qmgr: set node n021 resources_available.ngpus=2
Qmgr: set node n022 resources_available.ngpus=2
Qmgr: set node n023 resources_available.ngpus=2
ngpus
http://www.beowulf.org/archive/2010-March/027640.html
1. Create a custom resource called "ngpus" in the resourcedef file as in: ngpus type=long flag=nh 2. This resource should then be explicitly set on each node that includes a GPU to the number it includes: set node compute-0-5 resources_available.ncpus = 8 set node compute-0-5 resources_available.ngpus = 2 Here I have set the number of cpus per node (8) explicitly to defeat hyper-threading and the actual number of gpus per node (2). On the other nodes you might have: set node compute-0-5 resources_available.ncpus = 8 set node compute-0-5 resources_available.ngpus = 0 Indicating that there are no gpus to allocate. 3. You would then use the '-l select' option in your job file as follows: #PBS -l select=4:ncpus=2:ngpus=2 This requests 4 PBS resource chunks. Each includes 2 cpus and 2 gpus. Because the resource request is "chunked" these 2 cpu x 2 gpu chunks would be placed together on one physical node. Because you marked some nodes as having 2 gpus in the nodes file and some to have 0 gpus, only those that have them will get allocated. As a consumable resource, as soon as 2 were allocated the total available would drop to 0. In total you would have asked for 4 chunks distributed to 4 physical nodes (because only one of these chunks can fit on a single node). This also ensures a 1:1 mapping of cpus to gpus, although it does nothing about tying each cpu to a different socket. You would to do that in the script with numactl probably. There are other ways to approach by tying physical nodes to queues, which you might wish to do to set up a dedicate slice for GPU development. You may also be able to do this in PBS using the v-node abstraction. There might be some reason to have two production routing queues that map to slight different parts of the system.
memory
1. Create a custom static integer memcount resource that will be tracked at the server and queue: a. In PBS_HOME/server_priv/resourcedef, add the line: memcount type=long flag=q b. Add the resource to the resources: line in PBS_HOME/ sched_priv/sched_config: resources: “[...], memcount” 2. Set limits at BigMem and SmallMem so that they accept the correct jobs: Qmgr: set queue BigMem resources_min.mem = 8gb Qmgr: set queue SmallMem resources_max.mem = 8gb 3. Set the order of the destinations in the routing queue so that BigMem is tested first, so that jobs requesting exactly 8GB go into BigMem: Qmgr: set queue RouteQueue route_destinations = “BigMem, SmallMem” 4. Set the available resource at BigMem using qmgr. If you want a maximum of 6 jobs from BigMem to use MemNode: Qmgr: set queue BigMem resources_available.memcount = 6 5. Set the default value for the counting resource at BigMem, so that jobs inherit the value: Qmgr: set queue BigMem resources_default.memcount = 1 6. Associate the vnode with large memory with the BigMem queue: Qmgr: set node MemNode queue = BigMem The scheduler will only schedule up to 6 jobs from BigMem at a time on the vnode with large memory. >>>>>>>>>>>>>>>>>> 5.7.3Setting Values for String Arrays A string array that is defined on vnodes can be set to a different set of strings on each vnode. Example of defining and setting a string array: • Define a new resource: foo_arr type=string_array flag=h • Setting via qmgr: Qmgr: set node n4 resources_available.foo_arr=“f1, f3, f5” • Vnode n4 has 3 values of foo_arr: f1, f3, and f5. We add f7: Qmgr: set node n4 resources_available.foo_arr+=f7 • Vnode n4 now has 4 values of foo_arr: f1, f3, f5 and f7. • We remove f1: Qmgr: set node n4 resources_available.foo_arr-=f1 • Vnode n4 now has 3 values of foo_arr: f3, f5, and f7. • Submission: qsub –l select=1:ncpus=1:foo_arr=f3 vi /var/spool/PBS/server_priv/resourcedef nodetype type=string_array flag=h vi /var/spool/PBS/sched_priv/sched_config resources: "ncpus, mem, arch, host, vnode, netwins, aoe ngpus memcount nodetype" /etc/init.d/pbs stop sleep 10 /etc/init.d/pbs start Qmgr: set node n020 resources_available.nodetype=“xl” set node n021 resources_available.nodetype=“xl” set node n022 resources_available.nodetype=“xl” set node n023 resources_available.nodetype=“xl”
creating queues
xl queue
create queue xl set queue xl queue_type = Execution set queue xl Priority = 50 set queue xl resources_min.nodetype = xl set queue xl resources_default.nodetype = xl set queue xl max_run = [u:PBS_GENERIC=1] set queue xl enabled = True set queue xl started = True
gpu
create queue gpu set queue gpu queue_type = Execution set queue gpu Priority = 50 set queue gpu resources_min.nodetype = xl set queue gpu resources_default.nodetype = xl set queue gpu max_run = [u:PBS_GENERIC=1] set queue gpu enabled = True set queue gpu started = True
sample pbs script to run jobs on the gpu queue
see: http://confluence.rcs.griffith.edu.au:8080/display/GHPC/BigDFT
pbs attributes
Special group to run restricted software
The following groups exists
DHI
stormsurge
nimrodusers
vasp
aerc
msr
genomics
gccm
glycomics
dbres
You could add this line in the pbs script #PBS -W group_list=nimrodusers or simply use it in qsub as follows: qsub -q gpu -W group_list=nimrodusers -l walltime=01:00:00