Platform LSF – Working with Hosts (bhost, lsload, lsmon)

Host status

Host status describes the ability of a host to accept and run batch jobs in terms of daemon states, load levels, and administrative controls. The bhosts and lsload commands display host status.

 

1. bhosts
Displays the current status of the host

STATUS DESCRIPTION
ok  Host is available to accept and run new batch jobs
unavail  Host is down, or LIM and sbatchd are unreachable.
unreach  LIM is running but sbatchd is unreachable.
closed  Host will not accept new jobs. Use bhosts -l to display the reasons.
unlicensed Host does not have a valid license.

 

2. bhosts -l
Displays the closed reasons. A closed host does not accept new batch jobs:

$ bhosts -l
HOST  node001
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
closed_Adm      60.00     -     16      0      0      0      0      0      -

CURRENT LOAD USED FOR SCHEDULING:
r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem   root maxroot
Total           0.0   0.0   0.0    0%   0.0     0    0 28656  324G   16G   60G  3e+05   4e+05
Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M    0.0     0.0

processes clockskew netcard iptotal  cpuhz cachesize diskvolume
Total             404.0       0.0     2.0     2.0 1200.0     2e+04      5e+05
Reserved            0.0       0.0     0.0     0.0    0.0       0.0        0.0

processesroot   ipmi powerconsumption ambienttemp cputemp
Total                 396.0   -1.0             -1.0        -1.0    -1.0
Reserved                0.0    0.0              0.0         0.0     0.0


aa_r aa_r_dy aa_dy_p aa_r_ad aa_r_hpc fluentall fluent fluent_nox
Total         17.0    25.0   128.0    10.0    272.0      48.0   48.0       50.0
Reserved       0.0     0.0     0.0     0.0      0.0       0.0    0.0        0.0

gambit geom_trans tgrid fluent_par
Total           50.0       50.0  50.0      193.0
Reserved         0.0        0.0   0.0        0.0

 

3. bhosts -X

Condensed host groups in an condensed format

$ bhosts -X
HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
comp027            ok              -     16      0      0      0      0      0
comp028            ok              -     16      0      0      0      0      0
comp029            ok              -     16      0      0      0      0      0
comp030            ok              -     16      0      0      0      0      0
comp031            ok              -     16      0      0      0      0      0
comp032            ok              -     16      0      0      0      0      0
comp033            ok              -     16      0      0      0      0      0

 

4. bhosts -l hostID

Display all information about specific server host such as the CPU factor and the load thresholds to start, suspend, and resume jobs

# bhosts -l comp067
HOST  comp067
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
ok              60.00     -     16      0      0      0      0      0      -

CURRENT LOAD USED FOR SCHEDULING:
r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem   root maxroot
Total           0.0   0.0   0.0    0%   0.0     0    0 13032  324G   16G   60G  3e+05   4e+05
Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M    0.0     0.0

processes clockskew netcard iptotal  cpuhz cachesize diskvolume
Total             406.0       0.0     2.0     2.0 1200.0     2e+04      5e+05
Reserved            0.0       0.0     0.0     0.0    0.0       0.0        0.0

processesroot   ipmi powerconsumption ambienttemp cputemp
Total                 399.0   -1.0             -1.0        -1.0    -1.0
Reserved                0.0    0.0              0.0         0.0     0.0

aa_r aa_r_dy aa_dy_p aa_r_ad aa_r_hpc fluentall fluent fluent_nox
Total         18.0    25.0   128.0    10.0    272.0      47.0   47.0       50.0
Reserved       0.0     0.0     0.0     0.0      0.0       0.0    0.0        0.0

gambit geom_trans tgrid fluent_par
Total           50.0       50.0  50.0      193.0
Reserved         0.0        0.0   0.0        0.0

LOAD THRESHOLD USED FOR SCHEDULING:
r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
loadSched   -     -     -     -       -     -    -     -     -      -      -
loadStop    -     -     -     -       -     -    -     -     -      -      -

root maxroot processes clockskew netcard iptotal   cpuhz cachesize
loadSched     -       -         -         -       -       -       -         -
loadStop      -       -         -         -       -       -       -         -

diskvolume processesroot    ipmi powerconsumption ambienttemp cputemp
loadSched        -             -       -                -           -       -
loadStop         -             -       -                -           -       -

 

5. lsload

[user1@login1 ~]$ lsload
HOST_NAME       status  r15s   r1m  r15m   ut    pg  ls    it   tmp   swp   mem
login1          ok   0.0   0.0   0.0   1%   0.0  17     0  240G   16G   28G
login2          ok   0.0   0.0   0.0   0%   0.0   0  7040  242G   16G   28G
node1           ok   0.0   0.4   0.3   0%   0.0   0 31760  324G   16G   60G

Displays the current state of the host:

STATUS DESCRIPTION
ok Host is available to accept and run batch jobs and remote tasks.
-ok LIM is running but RES is unreachable.
busy Does not affect batch jobs, only used for remote task placement (i.e., lsrun). The value of a load index exceeded a threshold (configured in lsf.cluster.cluster_name, displayed by lshosts -l). Indices that exceed thresholds are identified with an asterisk (*).
lockW Does not affect batch jobs, only used for remote task placement (i.e., lsrun). Host is locked by a run window (configured in lsf.cluster.cluster_name, displayed by lshosts -l).
lockU Will not accept new batch jobs or remote tasks. An LSF administrator or root explicitly locked the host using lsadmin limlock, or an exclusive batch job (bsub -x) is running on the host. Running jobs are not affected. Use lsadmin limunlock to unlock LIM on the local host.
unavail Host is down, or LIM is unavailable.

 

6. lshosts -l
The lshosts command shows the load thresholds.

$ lshosts -l
HOST_NAME:  comp001
type             model  cpuf ncpus ndisks maxmem maxswp maxtmp rexpri server nprocs ncores nthreads
X86_64     Intel_EM64T  60.0    16      1    63G    16G 352423M      0    Yes      2      8        1

RESOURCES: Not defined
RUN_WINDOWS:  (always open)

LOAD_THRESHOLDS:
r15s   r1m  r15m   ut    pg    io   ls   it   tmp   swp   mem   root maxroot processes clockskew netcard iptotal  cpuhz cachesize diskvolume processesroot   ipmi powerconsumption ambienttemp cputemp
-   3.5     -    -     -     -    -    -     -     -     -      -       -         -         -       -       -      -         -          -             -      -                -           -       -

 

7. References:

  1. Platform – Working with hosts

RHEV 3.4 – Explanation of Settings in the New Data Center and Edit Data Center Windows

The table below describes the settings of a data center as displayed in the New Data Center and Edit Data Center windows. Invalid entries are outlined in orange when you click OK, prohibiting the changes being accepted. In addition, field prompts indicate the expected values or range of values.

Figure 1: RHEVM Portal –

RHEV_Data-Centres

Figure 2: RHEVM – Adding New Data Centre

RHEV_Data-Centres-AddNew

Details on each items

Field
Description/Action
Name
The name of the data center. This text field has a 40-character limit and must be a unique name with any combination of uppercase and lowercase letters, numbers, hyphens, and underscores.
Description
The description of the data center. This field is recommended but not mandatory.
Type
The storage type. Choose one of the following:
  • Shared
  • Local
The type of data domain dictates the type of the data center and cannot be changed after creation without significant disruption. Multiple types of storage domains (iSCSI, NFS, FC, POSIX, and Gluster) can be added to the same data center, though local and shared domains cannot be mixed.
Compatibility Version
The version of Red Hat Enterprise Virtualization. Choose one of the following:
  • 3.0
  • 3.1
  • 3.2
  • 3.3
  • 3.4

After upgrading the Red Hat Enterprise Virtualization Manager, the hosts, clusters and data centers may still be in the earlier version. Ensure that you have upgraded all the hosts, then the clusters, before you upgrade the Compatibility Level of the data center.

Quota Mode
Quota is a resource limitation tool provided with Red Hat Enterprise Virtualization. Choose one of:

  • Disabled: Select if you do not want to implement Quota
  • Audit: Select if you want to edit the Quota settings
  • Enforced: Select to implement Quota

 

Create and Populate a New ISO NFS Storage Domain for RHEV

For the NFS ISO Domain or NFS Data Domain, do ensure the followings

  1. Ensure the NFS File System is properly exported
  2. Ensure the user and group ownership is correct

Step 1: If you are exporting ISO on your NFS File Server. At your /etc/exports

/exports/ISO  192.168.1.0/255.255.255.0 (rw, async)

Export NFS and check mounting

# exportfs -av
# showmount -e

 

Step 2: Change Permission
The permission on the exported file system must be owned and writable by user vdsm and group kvm

# chown vdsm.kvm /exports/iso
# chmod g+s /exports/iso

 

Step 3: Check list of available ISO Domains. At RHEM

# rhevm-iso-uploder list
ISO Storage Domain Name   | Datacenter                | ISO Domain Status
dmn_ixora_iso_vol         | RH_Resource               | active

 

Step 4: Mount the ISO Domain using Red Hat Enterprise Manager (RHEM)

RHEM_Storage_Domain

 

Step 5: Uploading ISO using command line. Upload the ISO files to the directory /…/uique-UUID/images/11111111-1111-1111-1111-111111111111

# cd /iso/f21673a0-376e-4381-8760-b681c824dd1a/images/11111111-1111-1111-1111-111111111111

Copy the linux or windows iso file into the above directory

# scp remote_server:/something-directory/rhel-6.5.iso .

Change Ownership back to vdsm.kvm

# chown vdsm.kvm rhel-6.5.iso

 

References:

  1. Create export domain or ISO Domain on RHEV 3

Speeding up kernel crash hang analysis with the kernel log

This is a summaries article taken from RHEL6: Speeding up kernel crash / hang analysis with the kernel log.

When there is a kernel crash or hang, there is often a very large file is produced containing a memory dump of the entire system called a vmcore. Analysis of the kernel crash or hang often requires this large file be uploaded to Red Hat for analysis (if you have subscription)

 

For RHEL 6.4 and above

Starting with RHEL 6.4, Starting with Red Hat Enterprise Linux 6.4 and kexec-tools-2.0.0-258.el6, the kdump process will dump the kernel log to a file called vmcore-dmesg.txt before creating the vmcore file.

# ls /var/crash/127.0.0.1-2012-11-21-09\:49\:25/
vmcore  vmcore-dmesg.txt
# cp /var/crash/127.0.0.1-2012-11-21-09\:49\:25/vmcore-dmesg.txt /tmp/00123456-vmcore-dmesg.txt

For RHEL 6.0 to RHEL 6.3, 

For other versions of Red Hat Enterprise Linux 6, or for cases where vmcore-dmesg.txt is not generated, you can use the following makedumpfile command to obtain the kernel log from an existing vmcore. NOTE: The makedumpfile command is part of the kexec-tools package)

# makedumpfile --dump-dmesg [path-to-vmcore] [kernel-log-file]
# makedumpfile --dump-dmesg /var/crash/127.0.0.1-2013-06-14-16\:26\:07/vmcore /tmp/00123456-vmcore-dmesg.txt
The dmesg log is saved to /tmp/00123456-vmcore-dmesg.txt.
makedumpfile Completed.

NOTE: If the above command fails, it may indicate the vmcore is corrupt to the point of not containing any useful information.

Deploying HAProxy 1.4.24 to load-balance MS Terminal Services on CentOS 6

HAProxy is an open source, free, veryfast and reliable solution offering high availability, load balancing and proxy for TCP and HTTP-based applications. It is particularly suited for very high traffic web sites and powers quite a number of the world’s most visited ones. Over the years it has become the de-facto standard opensource load balancer, is now shipped with most mainstream Linux distributions, and is often deployed by default in cloud platforms.

The content of this blog entry is taken from Load balancing Windows Terminal Server – HAProxy and RDP Cookies or Microsoft Connection Broker

In this blog entry, we will put in a sample working haproxy configuration to load balance between terminal services

 

Step 1: Install haproxy

# yum install haproxy

Step 2: Modify /etc/haproxy/haproxy.cfg

 

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local2

chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4500
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
timeout queue 1m
timeout connect 60m
timeout client 60m
timeout server 60m

# -------------------------------------------------------------------
# [RDP Site Configuration]
# -------------------------------------------------------------------
listen cattail 155.69.57.11:3389
mode tcp
tcp-request inspect-delay 5s
tcp-request content accept if RDP_COOKIE
persist rdp-cookie
balance leastconn
option tcpka
option tcplog
server win2k8-1 192.168.6.48:3389 weight 1 check inter 2000 rise 2 fall 3
server win2k8-2 192.168.6.47:3389 weight 1 check inter 2000 rise 2 fall 3
option redispatch

listen stats :1936
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /

Information:

  • timeout client and timeout server is put at 6 hours (360m) to keep idle RDP session established
  • persist rdp-cookie and balance rdp-cookie. These instruct HAProxy to inspect the incoming RDP connection for a cookie; if one is found, it is used to persistently direct the connection to the correct real server
  • The 2 tcp-request lines help to ensure that HAProxy sees the cookie on the initial request.

To see the haproxy reports, the URL can be found at http://localhost:1936/haproxy?stats

Reference: