Forcibly kill or purge the Job in the Torque Scheduler

When there is a job stuck and cannot be remove by a normal qdel, you can use the command qdel -p jobid. Do note that this command should be used when there is no other way to kill off the job in the usual fashion especially if the compute node is unresponsive.

# qdel -p jobID

References:

  1. [torqueusers] qdel will not delete

Uninstalling GPFS rpms cleanly from Client Nodes

Step 1:

# rpm -e gpfs.msg.en_US gpfs.docs gpfs.gpl

Step 2:

# rpm -e gpfs.base
error: Failed dependencies:
gpfs.base is needed by (installed) gpfs.gplbin-2.6.32-279.el6.x86_64-3.4.0-12.x86_64

Step 3: Remove the specific gplbin

# rpm -e gpfs.gplbin-2.6.32-279.el6.x86_64-3.4.0-12.x86_64

Step 4:

# rpm -e gpfs.base

References:

  1. Install and configure General Parallel File System (GPFS) on xSeries

Using Torque to set up a Queue to direct users to a subset of resources

If you are running clusters, you may want to set up a queue to direct users to a subset of resources with Torque. For example, I may wish to direct a users who needs specific resources like MATLAB to a particular queue.

More information can be found at Torque Documents 4.1 “4.1.4 Mapping a Queue to a subset of Resources


….The simplest method is using default_resources.neednodes on an execution queue, setting it to a particular node attribute. Maui/Moab will use this information to ensure that jobs in that queue will be assigned nodes with that attribute…… 

For example, if you are creating a queue for users of MATLAB

qmgr -c "create queue matlab"
qmgr -c "set queue matlab queue_type = Execution"
qmgr -c "set queue matlab resources_default.neednodes = matlab"
qmgr -c "set queue matlab enabled = True"
qmgr -c "set queue matlab started = True"

For those nodes, you are assigning to the queue, do update the nodes properties. A good example can be found at 3.2 Nodes Properties

To add new properties on-the-fly,

qmgr -c "set node node001 properties += matlab"

(if you are adding additional properties to the nodes)

To remove properties on-the-fly

qmgr -c "set node node001 properties -= matlab"

Displaying SPICE on the VM network for RHEV 3.4

By default, SPICE graphic server somehow by default uses the management network to display the console. Usually the management network is not visible to the users.

For RHEV 3.4, this can be easily resolve on the RHEV Manager console

  1. Portal > Networks
  2. Click on the Network you wish SPICE graphic Server to display on
  3. Click “Manage Network”
  4. Click “Display Network”

 

Once configured. Remember to REBOOT all the VMs to activate the changes

RHEV_SPICE

 

 

Installing and Configuring Red Hat Enterprise Virtualisation

Step 1: Ensure that you have subscribed to Red Hat Virtualisation Channels. For more information, see Subscribing to Red Hat Virtualisation Manager Channels

Step 2: Install RHEVM packages.

This will take a while…… 1.6GB of downloads…….

# yum install rhevm rhevm-reports

Step 3: Link the Directory Server to the Red Hat Enterprise Virtualisation Manager.

See Joining RHEVM-Manage-Domain tool to join AD Domain

Step 4: Install RHEV manager

# rhevm-setup

Step 5: Go to the website – Administration Portal

RHEV_portal

Step 6: Logon to the Portal with admin

RHEV_Portal2

Step 7: Create Data Centre

Step 8: Create and Populate a New ISO NFS Storage Domain

Step 9: Creation of Logical Network

Step 10: Creation of Windows 7 with Virtio

Using log collector in RHEV 3.3 and above to collect full log

The Log Collector Utility for RHEV 3 is located in /usr/bin/rhevm-log-collector and is provided by the rhevm-log-collector package installed on the RHEV manager system.

1. To collect all the information, use command

# engine-log-collector
INFO: Gathering oVirt Engine information...
INFO: Gathering PostgreSQL the oVirt Engine database and log files from localhost...
Please provide the REST API password for the admin@internal oVirt Engine user (CTRL+D to skip):
About to collect information from 1 hypervisors. Continue? (Y/n): y
INFO: Gathering information from selected hypervisors...
INFO: collecting information from 192.168.50.56
INFO: finished collecting information from 192.168.50.56
Creating compressed archive...

2. To collect information from selected hosts ending with ending in .11 and .15

# engine-log-collector --hosts=*.11,*.15

3. To collect information from the RHEV-M only

# engine-log-collector --no-hypervisors

References

  1. https://access.redhat.com/solutions/61546