Running Arrays on PBS Professional

If you are intending to run the same program with the different input files, it is best you use Jobs Array instead of creating separate programs for the input files which is tedious. It is very easy

Amending the Submission Scripts (Part 1)

To create an arrays jobs, you have to use the -J option on the PBS Scripts. For 10 sub-jobs, you do the following

#PBS -J 1-10

Amending the Submission Scripts (Part 2)

If your input files are concatenated with a running number. For example, if your input file is data1.gjf, data2.gjf, data3.gjf, data4.gjf, data5.gjf ….. data10.gjf

inputfile=data$PBS_ARRAY_INDEX.gjf

Submitting the Jobs

a. To submit the jobs, just

% qsub yoursubmissionscript.pbs

Checking Jobs

b. You will notice that after you qstat, you will notice that your jobs bas a “B”

% qstat -u user1
544198[].node1 Gaussian-09e user1 0 B q32

You have to do a “-t” or “-Jt”

% qstat -t 544198[]

% qstat -t 544198[]
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
544198[].node1 Gaussian-09e user1 0 B q32
544198[54].node1 Gaussian-09e user1 00:40:21 R q32
544198[55].node1 Gaussian-09e user1 00:15:25 R q32

To delete the Sub Jobs

% qdel "544198[5]"

Basic Tracing of Jobs Issues in PBS Professional

Step 1: Proceed to the Head Node (Scheduler)

Once you have the Job ID you wish to investigate, go to the Head Node and do

% tracejob jobID

From the tracejob, you will be able to take a peek which node the job landed. Next you can go the node in question and find information from the mom_logs

% vim /var/spool/pbs/mom_logs/thedateyouarelookingat

For example,

% vim /var/spool/pbs/mom_logs/20201211

Using Vim, search for the Job ID

? yourjobID

You should be able to get a good hint of what has happened. In my case is that my nvidia drivers are having issues.

 

Resolving Altair Access Incorrect UserName and Password

If you are facing issues like “Incorrect UserName or Password” Do the following on the main system supporting the Visualisation Server (May or may not be the Server hosting Altair Access Services).

/etc/init.d/altairlmxd stop
/etc/init.d/altairlmxd start
/etc/init.d/pbsworks-pa restart

On the Altair Access Server,

/etc/init.d/guacd restart

 

 

Altair HPC Virtual Summit 2020

Join Altair’s high-performance and high-throughput computing experts, along with our partners, technology users, and industry peers, for a virtual summit exploring the leading-edge enterprise computing solutions that will keep innovation moving forward in 2020 and beyond.

From orchestrating compute workloads that get more dynamic by the day to supporting distributed teams all while meeting demand for cost-saving, efficiency-enhancing solutions, today’s technology infrastructure stakeholders play an integral role in ensuring their organizations retain a competitive edge.

September 9th and 10th, HPC leaders across the globe will meet for two half days of virtual PBS Professional user groups, “ask the developer” sessions, panel discussions and more. For more information, see https://hpc2020.virtual.altair.com/

 

Date: September 9th & 10th

Restrict Number of Queued and Running Jobs with PBS Professional

Allow maximum queued jobs limit at Server level

% qmgr -c "set server max_queued = [u:PBS_GENRIC=128]"

Apply maximum queued jobs limit at Queue Level

% qmgr -c "set queue your-queue-name max_queued = [u:PBS_GENRIC=128]"

Apply maximum Running jobs limit at Server Level

% qmgr -c "set server max_run = [u:PBS_GENRIC=128]"

Apply maximum running jobs limit at Queue Level

% qmgr -c "set queue your-queue-name max_run = [u:PBS_GENRIC=128]"

Limiting Users on PBS Professional

Scenario 1: How do we restrict the users to a certain maximum job size within a maximum concurrent number of jobs?

For example, if you would like to restrict users using this queue to a maximum of 4 cores per jobs. But his or her concurrent jobs cannot exceed 16?

qmgr -c "set queue workq max_run_res.ncpus = [u:PBS_GENERIC=16]"
qmgr -c "set queue workq resources_max.ncpus = 4"

The first limit sets max of 16 cores per user for the workq queue (for all jobs)
The second limit sets max of 4 cores per job for workq queue

 

Scenario 2: How do we ensure that users only run a minimum number of cores in the queue?

For example, if you would like to restrict the users to a minimum 32 cores per job.

qmgr -c " s q workq resources_min.ncpus=32"

Test:

qsub -l select=1:ncpus=16 -q workq -- /bin/sleep 100
qsub: Job violates queue and/or server resource limits