Using xsos to summarise System Information

Introduction

The goal of xsos is to make it easy to instantaneously gather information about a system together in an easy-to-read-summary, whether that system is the localhost on which xsos is being run or a system for which you have an unpacked sosreport.

xsos will attempt to make it easy, parsing and calculating and formatting data from dozens of files (and commands) to give you a detailed overview about a system.

 

Installation

Manual Install

% git clone https://github.com/ryran/xsos.git
Cloning into 'xsos'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 946 (delta 1), reused 5 (delta 1), pack-reused 940
Receiving objects: 100% (946/946), 907.12 KiB | 728.00 KiB/s, done.
Resolving deltas: 100% (450/450), done.

Point 1: Get information on OS and Memory

Point 2: Get information on Network, Network Adapters and CPU

Point 3: Get Information on BIOS

Point 4: Get Information on sysctl

Point 5: Get information on kdump

References:

    1. xsos — a tool for sysadmins and support techs
    2. xsos Project Space

Checking nproc limits

One of our Linux Compute Server was showing when a particular was attempting to login on.

failed to execute /bin/bash: resource temporarily unavailable

We suspected that nprocs limits have been breached by the particular user. I found this write-up https://blog.dbi-services.com/linux-how-to-monitor-the-nproc-limit-1/ very prescriptive of the issue I faced.

Extracting information via ps is not useful unless you use the “-L” to show threads, possibly LWP (light-weight process).

 

% ps h -LA -o user | sort | uniq -c | sort -n
1 chrony
1 dbus
1 libstoragemgmt
1 nobody
1 rpc
1 rpcuser
2 avahi
2 user3
2 postfix
3 colord
3 rtkit
4 user1
4 user2
7 polkitd
23 user4
31 user5
34 user6
361 user7
442 user8
556 gdm
563 user9
922 user10
16384 user11
3319 root

You can see that user11 has 16384 threads!

To dig down into what is happening to a selected user. We will use user2 since it has one of the fewest LWP to

% ps -o nlwp,pid,lwp,args -u user2 | sort -n
NLWP PID LWP COMMAND
1 272705 272705 sshd: user2@pts/12
1 273054 273054 sshd: user2@notty
1 273216 273216 /usr/libexec/openssh/sftp-server
1 273406 273406 -bash

nlwp – Number of LWP
lwp – Process of ID of the LWP.

To eliminate the offending user11’s thousands of threads

% pkill -KILL -u user11

References

  1. Linux: how to monitor the nproc limit
  2. How is the nproc hard limit calculated and how do we change the value on CentOS 7

General Linux OS Tuning for AMD EPYC

Step 1: Turn off swap
Turn off swap to prevent accidental swapping. Do not that disabling swap without sufficient memory can have undesired effects

swapoff -a

Step 2: Turn off NUMA balancing
NUMA balancing can have undesired effects and since it is possible to bind the ranks and memory in HPC, this setting is not needed

echo 0 > /proc/sys/kernel/numa_balancing

Step 3: Disable ASLR (Address Space Layout Ranomization) is a security feature used to prevent the exploitation of memory vulnerabilities

echo 0 > /proc/sys/kernel/randomize_va_space

Step 4: Set CPU governor to performance and disable cc6. Setting the CPU perfomance to governor to perfomrnaces ensures max performances at all times. Disabling cc6 ensures that deeper CPU sleep states are not entered.

cpupower frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
.....
.....
cpupower idle-set -d 2
Idlestate 2 disabled on CPU 0
Idlestate 2 disabled on CPU 1
Idlestate 2 disabled on CPU 2
.....
.....

References:

  1. Tuning Guard for AMD EPYC (pdf)

Getting Useful Information on CPU and Configuration

Point 1. lscpu

To install

yum install util-linux

lscpu – (Print out information about CPU and its configuration)

[user1@myheadnode1 ~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
Stepping: 4
CPU MHz: 3200.000
BogoMIPS: 6400.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 25344K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
Flags: fpu .................

Point 2: hwloc-ls

To install hwloc-ls

yum install hwloc

hwloc – (Prints out useful information about the NUMA locality of devices and general hardware locality information)

[user1@myheadnode1 ~]# hwloc-ls
Machine (544GB total)
NUMANode L#0 (P#0 256GB)
Package L#0 + L3 L#0 (25MB)
L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#16)
L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
PU L#2 (P#1)
PU L#3 (P#17)
L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
PU L#4 (P#2)
PU L#5 (P#18)
L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
PU L#6 (P#3)
PU L#7 (P#19)
L2 L#4 (1024KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
PU L#8 (P#4)
PU L#9 (P#20)
L2 L#5 (1024KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
PU L#10 (P#5)
PU L#11 (P#21)
L2 L#6 (1024KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
PU L#12 (P#6)
PU L#13 (P#22)
L2 L#7 (1024KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
PU L#14 (P#7)
PU L#15 (P#23)
.....
.....
.....

Point 3 – Check whether the Boost is on for AMD

Print out if CPU boost is on or off

cat /sys/devices/system/cpu/cpufreq/boost
1

References:

  1. Tuning Guard for AMD EPYC (pdf)

Turning ksm and ksmtuned off

In this blog, I will write on how to turn off KSM and ksmtuned since I do not need these services and save some unnecessary swapping activities on the disk.

What is KSM?

According to RedHat Site (8.4. KERNEL SAME-PAGE MERGING (KSM)),
Kernel same-page Merging (KSM), used by the KVM hypervisor, allows KVM guests to share identical memory pages. These shared pages are usually common libraries or other identical, high-use data. KSM allows for greater guest density of identical or similar guest operating systems by avoiding memory duplication……

KSM is a Linux feature which uses this concept in reverse. KSM enables the kernel to examine two or more already running programs and compare their memory. If any memory regions or pages are identical, KSM reduces multiple identical memory pages to a single page……

8.4.4 Kernel same-page merging (KSM) has a performance overhead which may be too large for certain environments or host systems. KSM may also introduce side channels that could be potentially used to leak information across guests. If this is a concern, KSM can be disabled on per-guest basis.

Deactivating KSM

# systemctl stop ksmtuned
Stopping ksmtuned:                                         [  OK  ]
# systemctl stop ksm
Stopping ksm:                                              [  OK  ]

To permanently deactivate KSM with the systemctl commands

# systemctl disable ksm
# systemctl disable ksmtuned

When KSM is disabled, any memory pages that were shared prior to deactivating KSM are still shared. To delete all of the PageKSM in the system, use the following command:

# echo 2 >/sys/kernel/mm/ksm/run

After this is performed, the khugepaged daemon can rebuild transparent hugepages on the KVM guest physical memory. Using # echo 0 >/sys/kernel/mm/ksm/run stops KSM, but does not unshare all the previously created KSM pages (this is the same as the # systemctl stop ksmtuned command).

References:

  1. Redhat – 8.4. Kernel Same-Page Merging (KSM)

Using strace to detect df hanging issues on NFS

strace is a wonderful tool to trace system calls and signals

I was hanging issues whenever I do a “df” and I was curious which file system is calling issues

strace df
.....
.....
stat("/run/user/1304561586", {st_mode=S_IFDIR|0700, st_size=40, ...}) = 0
stat("/run/user/17132623", {st_mode=S_IFDIR|0700, st_size=40, ...}) = 0
stat("/run/user/17149581", {st_mode=S_IFDIR|0700, st_size=40, ...}) = 0
stat("/run/user/1304565184", {st_mode=S_IFDIR|0700, st_size=60, ...}) = 0
stat("/scratch",

It is obvious that /scratch file hang immediately after being launched.

Basic GNU Screen Usage on CentOS

Introduction

Screen is a full-screen window manager that multiplexes a physical terminal between several processes, typically interactive shells. In other words, you can start any number of virtual terminals inside the session. The good thing is that processes running inside screen will continue to run even though the SSH session get disconnected.

GNU Screen Site

GNU Screen can be found on http://ftp.gnu.org/gnu/screen/

Source Code

You can get the source code from here

Using Screen

Screen can be easily installed on CentOS with just

# yum install screen

Naming a Named Session

You may be running many sessions and it is a good idea to name the session that you are starting.

screen -S your_preferred_screen_name

Listing running Screen Session

[user1@node1 ~]$ screen -ls
There is a screen on:
2109.myScreenA (Detached)
1 Socket in /var/run/screen/S-user1

Reattach to a Screen Session

To connect back to the screen, just type the numeric id of the screen

screen -r 2109

Detaching from a Screen Session

Inside the Virtual Session, you can detach the screen with the command

Ctrl-a + d

[Press ctrl with “a” and “d” together]
If you are already outside the virtual session, you can detach an active session by

screen -d 2109

Customised Screen

If you a looking at how to split Screen using screen. Here is a good visual guide.

To Terminate the Screen Session,

Enter into the Session screen

screen -r 2109
exit

 

User Guide

https://www.gnu.org/software/screen/manual/screen.html#Startup-Files

Links:

How To Use Linux Screen

Unable to open /dev/sdb with fdisk

Taken from my old blog Unable to open /dev/sdb with fdisk

Fdisk is a menu driven program for creation and manipulation of partition tables. The device is usually something like /dev/sda, /dev/sdb. A device name refers to the entire disks. /dev/sd? is the partition of the device. For example, /dev/sda1 refers to the first partition of the first device.

If you issued a command and you receive a corresponding message “unable to open /dev/sdb”

# fdisk /dev/sdb

 

Unable to open /dev/sdb

Linux is unable to locate or find the partition. One method to verify that it is so, do a listing of the

devices fdisk can see. In this example below, the partition has been created already.

# fdisk -l

 

Disk /dev/sdb: 2997.4 GB, 2997426536960 bytes
255 heads, 63 sectors/track, 364416 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1      267349  2147480811   83  Linux

WARNING: The size of this disk is 3.0 TB (2997400633344 bytes).
DOS partition table format can not be used on drives for volumes
larger than 2.2 TB (2199023255040 bytes). Use parted(1) and GUID
partition table format (GPT).

Once you have verified the presence of the device, do a fdisk /dev/sdb again

 

xrdp_mm_process_login_response: login failed on CentOS 6

This post is taken from my old blog xrdp_mm_process_login_response: login failed

If you encountered this error xrdp_mm_process_login_response: login failed when you use the remote desktop connection to connection to a vnc session.

Even if you restart xrdp, the error still remain, the issue could be due to locked  X11 session that was created by xrdp.

To solve the issue, go to the/tmp/.X11-unix/ and find your X session and delete the session.

# cd /tmp/.X11-unix

Do a listing

# ls -l

Look at the session owned by you which you wished to delete

.....
.....
srwxrwxrwx 1 root      root  0 Jul  9  2012 X0
srwxrwxrwx 1 user1  users 0 Jan 25 09:13 X1
srwxrwxrwx 1 user2      users 0 Jul 10  2012 X10
srwxrwxrwx 1 user3     users 0 Feb 19 13:31 X11
srwxrwxrwx 1 user4  users 0 Nov 20 15:10 X12
srwxrwxrwx 1 user5     users 0 Jul 10  2012 X13
.....
.....

Delete the session……

If xrdp still fails, it seems that it is due to orphaned X–. Once xrdp hits an orphaned X– which may or may not be from other users, the error will still remain.

To see the orphaned X11 session, you can do a vncserver, which you will see something like that

# vncserver

 

Warning: Head-Node:1 is taken because of /tmp/.X11-unix/X1
Remove this file if there is no X server Head-Node:1

Delete all the orphaned X–
Restart the xrdp service and try the remote connection.

# service xrdp restart

If you are still having the issue, do look at alternative solution

  1. X Server — no display in range is available. xrdp_mm_process_login_response: login failed

 

Fixing Authentication is required to set the network proxy used for download packages on CentOS 6

This was an old post from my dated blogger Fixing Authentication is requried to set the network proxy used for download packages for CentOS 6

I encountered this pop-up error today when I was xrdp into my CentOS 6. The error was something like this

authentication is required to set the network proxy used for downloading packages.  
An application is attempting to perform an action that requires privileges.  
Authentication as the super user is required to perform this action" and asking  
for the root password



Non-root users

Step 1: Launch a Terminal Console and type

# gnome-session-properties

Step 2: Uncheck PackageKit Update Applet .

See pix below.

Root User

Step 1: Disabled /etc/yum/pluginconf.d

# vim /etc/yum/pluginconf.d/refresh-packagekit.conf
[main]
enabled=0