Error “Too many files open” on CentOS 7

If you are encountering Error messages during login with “Too many open files” and the session gets terminated automatically, it is because the open file limit for a user or system exceeds the default setting and  you may wish to change it

@ System Levels

To see the settings for maximum open files,

# cat /proc/sys/fs/file-max
55494980

This value means that the maximum number of files all processes running on the system can open. By default this number will automatically vary according to the amount of RAM in the system. As a rough guideline it will be about 100,000 files per GB of RAM.

 

To override the system wide maximum open files, as edit the /etc/sysctl.conf

# vim /etc/sysctl.conf
 fs.file-max = 80000000

Activate this change to the live system

# sysctl -p

@ User Level

To see the setting for maximum open files for a user

# su - user1
$ ulimit -n
1024

To change the setting, edit the /etc/security/limits.conf

$ vim /etc/security/limits.conf
user - nofile 2048

To change for all users

* - nofile 2048

This set the maximum open files for ALL users to 2048 files. These settings will require a reboot to activate.

References:

  1. How to correct the error “Too many files open” on Red Hat Enterprise Linux

Tools to Show your System Configuration

Tool 1: Display Information about CPU Architecture

[user1@node1 ~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
Stepping: 4
CPU MHz: 3200.000
BogoMIPS: 6400.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 25344K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
.....
.....

Tool 2: List all PCI devices

[user1@node1 ~]# lspci -t -vv
-+-[0000:d7]-+-00.0-[d8]--
| +-01.0-[d9]--
| +-02.0-[da]--
| +-03.0-[db]--
| +-05.0 Intel Corporation Device 2034
| +-05.2 Intel Corporation Sky Lake-E RAS Configuration Registers
| +-05.4 Intel Corporation Device 2036
| +-0e.0 Intel Corporation Device 2058
| +-0e.1 Intel Corporation Device 2059
| +-0f.0 Intel Corporation Device 2058
| +-0f.1 Intel Corporation Device 2059
| +-10.0 Intel Corporation Device 2058
| +-10.1 Intel Corporation Device 2059
| +-12.0 Intel Corporation Sky Lake-E M3KTI Registers
| +-12.1 Intel Corporation Sky Lake-E M3KTI Registers
| +-12.2 Intel Corporation Sky Lake-E M3KTI Registers
| +-12.4 Intel Corporation Sky Lake-E M3KTI Registers
| +-12.5 Intel Corporation Sky Lake-E M3KTI Registers
| +-15.0 Intel Corporation Sky Lake-E M2PCI Registers
| +-16.0 Intel Corporation Sky Lake-E M2PCI Registers
| +-16.4 Intel Corporation Sky Lake-E M2PCI Registers
| \-17.0 Intel Corporation Sky Lake-E M2PCI Registers
.....
.....

Tool 3: List all PCI devices

[user@node1 ~]# lsblk
sda 8:0 0 1.1T 0 disk
├─sda1 8:1 0 200M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 1.1T 0 part
├─centos-root 253:0 0 50G 0 lvm /
├─centos-swap 253:1 0 4G 0 lvm [SWAP]
└─centos-home 253:2 0 1T 0 lvm

Tool 4: See flags kernel booted with

[user@node1 ~]
BOOT_IMAGE=/vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8

Tool 5: Display available network interfaces

[root@hpc-gekko1 ~]# ifconfig -a
eno1: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether XX:XX:XX:XX:XX:XX txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
.....
.....
.....

Tool 6 Using dmidecode to find hardware information
See Using dmidecode to find hardware information

Using qperf to measure network bandwidth and latency

qperf is a network bandwidth and latency measurement tool which works over many transports including TCP/IP, RDMA, UDP and SCTP

Step 1: Installing qperf on both Server and Client

Server> # yum install qperf
Client> # yum install qperf

Step 2: Open Up the Firewall on the Server

# firewall-cmd --permanent --add-port=19765-19766/tcp
# firewall-cmd --reload

The Server listens at TCP Port 19765 by default. Do note that once qperf makes a connection, it will create a control port and data port , the default data port is 19765 but we also need to enable a data port to run test which can be 19766.

Step 3a: Have the Server listen (as qperf server)

Server> $ qperf

Step 4: Connect to qperf Server with qperf Client and measure bandwidth

Client > $ qperf -ip 19766 -t 60 qperf_server_ip_address tcp_bw
tcp_bw:
    bw  =  2.52 GB/sec

Step 5: Connect to qperf Server with qperf Client and measure latency

qperf -vvs qperf_server_ip_address tcp_lat
tcp_lat:
latency = 20.7 us
msg_rate = 48.1 K/sec
loc_send_bytes = 48.2 KB
loc_recv_bytes = 48.2 KB
loc_send_msgs = 48,196
loc_recv_msgs = 48,196
rem_send_bytes = 48.2 KB
rem_recv_bytes = 48.2 KB
rem_send_msgs = 48,197
rem_recv_msgs = 48,197

References:

  1. How to use qperf to measure network bandwidth and latency performance?

Using FIO to measure IO Performance

FIO is a tool for measuring IO performance. You can use FIO to run a user-defined workload and collect the associated performance data.

A good writeup can be found at Fio Basics. For more information, do MAN on fio

Step 1: To Install fio, make sure you activated the epel repository.

# yum install fio

Parameters to be used.

% fio --filename=myfio --size=5GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1
  • –filename
    An entire block size or filename can be specified. For example if you are testing block size filename=/dev/sda:/dev/sdb
  • –size
    Indicate the size of the test file generated.  If you are testing the block size and did not specify the size, the entire block will be used.
  • –direct=1
    This means that the page cache is bypassed. No memory is used.
  • –rw
    Check whether IO is access sequentially or randonly. There are a few options.
    read – Sequential reading
    write – Sequential Write
    randread – random reading
    randwrite – random writing
    readwrite , rw – Mixed, sequential workload
    randrw – Mixed Random Workload
  • –bs
    Block Size. By default 4kB is used.
  • –ioengine=libiao
    libaio  enables asynchronous access from the application level. The option would requires –direct=1
  • –iodepth
    Refers to how many requests.
  • –numjobs
    Specifies the number of processes that start the jobs in parallel
  • –runtime
    Terminate processing after the specified period of time
  • –name
    Name of the job
  • –time-based
    fio will run for the duration of the runtime specified
  • –group_reporting
    Display statistics for groups of jobs as a whole instead of for each individual job. This is especially true if
    numjobs is used;

 

Listing processes for a specific user

Using htop to list users. Which is one of my favourite.

% top -U user1

pstree which displays a tree of processes and can include parents and child processes which make it easier to understand.

% pstree -l -a -p -s user1


where
-l : Long format
-a : Show command line args
-p : Display Linux PIDs
-s : See parents of the selected process

pgrep look up or signal processes based on name and other attributes

% pgrep -l -u user1

References:

  1. Linux list processes by user names

Using xsos to summarise System Information

Introduction

The goal of xsos is to make it easy to instantaneously gather information about a system together in an easy-to-read-summary, whether that system is the localhost on which xsos is being run or a system for which you have an unpacked sosreport.

xsos will attempt to make it easy, parsing and calculating and formatting data from dozens of files (and commands) to give you a detailed overview about a system.

 

Installation

Manual Install

% git clone https://github.com/ryran/xsos.git
Cloning into 'xsos'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 946 (delta 1), reused 5 (delta 1), pack-reused 940
Receiving objects: 100% (946/946), 907.12 KiB | 728.00 KiB/s, done.
Resolving deltas: 100% (450/450), done.

Point 1: Get information on OS and Memory

Point 2: Get information on Network, Network Adapters and CPU

Point 3: Get Information on BIOS

Point 4: Get Information on sysctl

Point 5: Get information on kdump

References:

    1. xsos — a tool for sysadmins and support techs
    2. xsos Project Space

Checking nproc limits

One of our Linux Compute Server was showing when a particular was attempting to login on.

failed to execute /bin/bash: resource temporarily unavailable

We suspected that nprocs limits have been breached by the particular user. I found this write-up https://blog.dbi-services.com/linux-how-to-monitor-the-nproc-limit-1/ very prescriptive of the issue I faced.

Extracting information via ps is not useful unless you use the “-L” to show threads, possibly LWP (light-weight process).

 

% ps h -LA -o user | sort | uniq -c | sort -n
1 chrony
1 dbus
1 libstoragemgmt
1 nobody
1 rpc
1 rpcuser
2 avahi
2 user3
2 postfix
3 colord
3 rtkit
4 user1
4 user2
7 polkitd
23 user4
31 user5
34 user6
361 user7
442 user8
556 gdm
563 user9
922 user10
16384 user11
3319 root

You can see that user11 has 16384 threads!

To dig down into what is happening to a selected user. We will use user2 since it has one of the fewest LWP to

% ps -o nlwp,pid,lwp,args -u user2 | sort -n
NLWP PID LWP COMMAND
1 272705 272705 sshd: user2@pts/12
1 273054 273054 sshd: user2@notty
1 273216 273216 /usr/libexec/openssh/sftp-server
1 273406 273406 -bash

nlwp – Number of LWP
lwp – Process of ID of the LWP.

To eliminate the offending user11’s thousands of threads

% pkill -KILL -u user11

References

  1. Linux: how to monitor the nproc limit
  2. How is the nproc hard limit calculated and how do we change the value on CentOS 7

General Linux OS Tuning for AMD EPYC

Step 1: Turn off swap
Turn off swap to prevent accidental swapping. Do not that disabling swap without sufficient memory can have undesired effects

swapoff -a

Step 2: Turn off NUMA balancing
NUMA balancing can have undesired effects and since it is possible to bind the ranks and memory in HPC, this setting is not needed

echo 0 > /proc/sys/kernel/numa_balancing

Step 3: Disable ASLR (Address Space Layout Ranomization) is a security feature used to prevent the exploitation of memory vulnerabilities

echo 0 > /proc/sys/kernel/randomize_va_space

Step 4: Set CPU governor to performance and disable cc6. Setting the CPU perfomance to governor to perfomrnaces ensures max performances at all times. Disabling cc6 ensures that deeper CPU sleep states are not entered.

cpupower frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
.....
.....
cpupower idle-set -d 2
Idlestate 2 disabled on CPU 0
Idlestate 2 disabled on CPU 1
Idlestate 2 disabled on CPU 2
.....
.....

References:

  1. Tuning Guard for AMD EPYC (pdf)

Getting Useful Information on CPU and Configuration

Point 1. lscpu

To install

yum install util-linux

lscpu – (Print out information about CPU and its configuration)

[user1@myheadnode1 ~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
Stepping: 4
CPU MHz: 3200.000
BogoMIPS: 6400.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 25344K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
Flags: fpu .................

Point 2: hwloc-ls

To install hwloc-ls

yum install hwloc

hwloc – (Prints out useful information about the NUMA locality of devices and general hardware locality information)

[user1@myheadnode1 ~]# hwloc-ls
Machine (544GB total)
NUMANode L#0 (P#0 256GB)
Package L#0 + L3 L#0 (25MB)
L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#16)
L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
PU L#2 (P#1)
PU L#3 (P#17)
L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
PU L#4 (P#2)
PU L#5 (P#18)
L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
PU L#6 (P#3)
PU L#7 (P#19)
L2 L#4 (1024KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
PU L#8 (P#4)
PU L#9 (P#20)
L2 L#5 (1024KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
PU L#10 (P#5)
PU L#11 (P#21)
L2 L#6 (1024KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
PU L#12 (P#6)
PU L#13 (P#22)
L2 L#7 (1024KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
PU L#14 (P#7)
PU L#15 (P#23)
.....
.....
.....

Point 3 – Check whether the Boost is on for AMD

Print out if CPU boost is on or off

cat /sys/devices/system/cpu/cpufreq/boost
1

References:

  1. Tuning Guard for AMD EPYC (pdf)

Turning ksm and ksmtuned off

In this blog, I will write on how to turn off KSM and ksmtuned since I do not need these services and save some unnecessary swapping activities on the disk.

What is KSM?

According to RedHat Site (8.4. KERNEL SAME-PAGE MERGING (KSM)),
Kernel same-page Merging (KSM), used by the KVM hypervisor, allows KVM guests to share identical memory pages. These shared pages are usually common libraries or other identical, high-use data. KSM allows for greater guest density of identical or similar guest operating systems by avoiding memory duplication……

KSM is a Linux feature which uses this concept in reverse. KSM enables the kernel to examine two or more already running programs and compare their memory. If any memory regions or pages are identical, KSM reduces multiple identical memory pages to a single page……

8.4.4 Kernel same-page merging (KSM) has a performance overhead which may be too large for certain environments or host systems. KSM may also introduce side channels that could be potentially used to leak information across guests. If this is a concern, KSM can be disabled on per-guest basis.

Deactivating KSM

# systemctl stop ksmtuned
Stopping ksmtuned:                                         [  OK  ]
# systemctl stop ksm
Stopping ksm:                                              [  OK  ]

To permanently deactivate KSM with the systemctl commands

# systemctl disable ksm
# systemctl disable ksmtuned

When KSM is disabled, any memory pages that were shared prior to deactivating KSM are still shared. To delete all of the PageKSM in the system, use the following command:

# echo 2 >/sys/kernel/mm/ksm/run

After this is performed, the khugepaged daemon can rebuild transparent hugepages on the KVM guest physical memory. Using # echo 0 >/sys/kernel/mm/ksm/run stops KSM, but does not unshare all the previously created KSM pages (this is the same as the # systemctl stop ksmtuned command).

References:

  1. Redhat – 8.4. Kernel Same-Page Merging (KSM)