May 12, 2024 by kittycool only

Optimizing Ansible Performance: Serial Execution

By default, Ansible parallelises tasks on multiple hosts simultaneously and speeds up automation in large inventories. But sometimes, this is not ideal in a load-balanced environment, where upgrading the servers simultaneously may cause the loss of services. How do we use Ansible to run the updates at different times? I use the keyword “serial” before executing the roles universal package.

- hosts: standalone_nodes
  become: yes
  serial: 1 
  roles:
        - linux_workstation

Alternatively, you can use percentages to indicate how many will upgrade at one time.

- hosts: standalone_nodes
  become: yes
  serial: 25%
  roles:
        - linux_workstation

References:

How to implement parallelism and rolling updates in Ansible

May 12, 2024 by kittycool only

Ansible Delayed Error Handling with Rescue Blocks: Chrony Setup Example

A recap there are 2 main use of Blocks in Ansible. The first write-up can be found at Grouping Tasks with Block in Ansible

Apply conditional logic to all the tasks within the block. In such a way, the logic only need to be declared once
Apply Error handling especially when recovering from an error condition.

Today, we will deal with Point 2 in this blog entry

According to Ansible Documentation found at https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_blocks.html

Rescue blocks specify tasks to run when an earlier task in a block fails. This approach is similar to exception handling in many programming languages. Ansible only runs rescue blocks after a task returns a ‘failed’ state. Bad task definitions and unreachable hosts will not trigger the rescue block.

Here is my simple example for implementation

- name: Check current Timezone
  command: timedatectl show --property=Timezone --value
  register: timezone_output
  changed_when: false

- name: Configure Timezone to Asia/Singapore
  command: timedatectl set-timezone Asia/Singapore
  when: timezone_output.stdout != "Asia/Singapore"

- name: Install and Configure Chrony Service Block
  block:
    - name: Install Chrony package
      dnf:
        name: chrony
        state: present

    - name: Configure Chrony servers
      lineinfile:
        path: /etc/chrony.conf
        line: "server sg.pool.ntp.org iburst"
        insertafter: '^#.*server 3.centos.pool.ntp.org iburst'
        state: present

    - name: Enable Chrony service
      service:
        name: chronyd
        state: started
        enabled: yes
  rescue:
    - name: Print when Errors
      debug:
        msg: 'Something failed at Chrony Setup'
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

May 6, 2024 by kittycool only

How to Install Rustup for All Linux Users in a Cluster or Shared Environment

Rustup is an installer for System Programming Language Rust. If you are running the installer script for yourself, you can take a look at https://github.com/vi/rustup

# curl -sf https://static.rust-lang.org/rustup.sh | sudo sh

But what if you have to install for all users in a Cluster or Shared Environment, how do we do it? Fortunately, we have good information that is found at Installing rustup for all linux users . I took some snipplets from that site

In that article, the author recommends to define RUSTUP_HOME and CARGO_HOME

# export RUSTUP_HOME=/usr/local/rust
# export CARGO_HOME=/usr/local/rust
# curl https://sh.rustup.rs -sSf | sh -s -- -y --no-modify-path

The directory should be installed at /usr/local/rust/toolchains/stable-x86_64-unknown-linux-gnu

# ls -l /usr/local/rust/toolchains/stable-x86_64-unknown-linux-gnu
drwxr-xr-x. 2 root root 0 May  6 13:43 bin
drwxr-xr-x. 3 root root 0 May  6 13:42 etc
drwxr-xr-x. 3 root root 0 May  6 13:43 lib
drwxr-xr-x. 2 root root 0 May  6 13:43 libexec
drwxr-xr-x. 5 root root 0 May  6 13:42 share

April 21, 2024 by kittycool only

Linux Journaling vs. Copy-On-Write Filesystems: A Comparison

Hope you have read

This comparison table is taken from the book “Architecture and Design of the Linux Storage Stack” which I find useful to help understand the differences between the two.

	Journaling	Copy-On-Write
Write handling	Changes are recorded in a journal before applying them to the actual file system	A separate copy of data is created to make modifications
Original data	Original data gets overwritten	Original data remains intact
Data Consistency	Ensures consistency by recording metadata changes and replaying them if needed	Ensures consistency by never modifying the original data
Performance	Minimal overhead depending on the type of journaling mode	Some performance gains because of faster writes
Space utilisation	Journal size is typically in MB, so no additional space is required	More space is required due to separate copies of data
Recovery times	Fast recovery times as the journal can be replaced instantly	Slower recovery times as data needs to be reconstructed using recent copies
Features	No built-in support for features such as compression or deduplication	Built-in support for compression and deduplication

Taken from “Architecture and Design of the Linux Storage Stack”

April 16, 2024 by kittycool only

Options for ANSYS Batch Mode

Options	Description
-aas	start Fluent in server mode
-act	load ACT Start page
affinity=<x>	set processor affinity; <x>={core \| sock \| off}
-app	launches the Remote Visualization Client
-appscript=<scriptfile>	run the specified script in App
-case <file_path> [-data]	reads the case file immediately after Fluent launches; can include the data file if it shares the same name as the case file
-cflush	flush the file cache buffer
-cnf=<x>	specify the hosts file
-command=”<TUI command>”	run TUI command on startup,
-driver <name>	sets the graphics driver; <name>={opengl \| opengl2 \| x11 \| null}
-env	show environment variables
-g	run without GUI or graphics
-gpgpu=<n>	specify number of GPGPUs per machine
-gpu[=<n>]	run with GPU Solver, and specify devices to use as needed (where <n> is a comma-separated list of devices)
-gr	run without graphics
-gu	run without GUI
-gui_machine=<hostname>	specify the machine to be used for running graphics-related process
-h<heap_size>	specify heap size for Cortex
-help	this listing
-host_ip=<host:ip>	specify the ip interface to be used by the host process
-i <journal>	read the specified journal file
-license=<x>	specify the license capability; <x>={enterprise \| premium}
-meshing	run Fluent in meshing mode
-mpi=<mpi>	specify MPI implementation; <mpi>={openmpi \| intel \| …},
-mpitest	run the mpitest program instead of Fluent to test the network
-nm	don’t display mesh after reading
-pcheck	check the network connections before spawning compute node processes
-platform=intel	use AVX2 optimized binary; This option is for processors that can support AVX2 instruction set
-post	run a post-processing-only executable
-prepost	run a pre/post-processing-only executable
-p<ic>	specify interconnect; <ic>={default \| eth \| ib},
-r	list all releases
-r<x>	specify release <x>

April 14, 2024 by kittycool only

Copy-on-write Filesystem in Linux in a Nutshell

The Working of a Copy-on-Write System In Brief

Copy-On-Write Filesystem does not overwrite the data in place, here is how it is done. Supposedly there is file that will be modified.

Copy the old data to an allocated location on the disk
New data is written to the allocated location on the disk.
Hence the name Copy-and-Write
The references for the new data are updated
However, the old data and its snapshots are still there

As described in the Architecture and Design of Linux Storage Stack by Muhammad Umer Page 59

As the old data is preserved in the process, filesystem recovery is very simplified. Since the previous state of the data is saved on another allocated location on disk. If there is an outrage, the system system can easily revert to its former state. This make the maintenance of any Journal obsolete. This also allows snapshots to be implemented at the filesystem level.

As the old data is still there, space utilisation may be more than what the user expects……

Some of the filesystem the use the CoW based approach includes Zttabyte Filesystem (ZFS) and B-Tree Filesystem (Btrfs)

April 8, 2024 by kittycool only

Journaling File System: Advantages, Working, and Impact on Performance

Definition

According to a nice explanation by minitools.com “What Is Journaling File System and Its Advantages/Disadvantages“

The journaling file system (JFS) is a kind of file system developed by IBM IN 1990. It keeps track of changes, which are not yet committed to the file system’s main part, by recording the goal of such changes in a data structure known as “journal”. Usually, the “journal” is a circular log.

In the event of a system crash or power failure, a journaling file system can be brought back online more quickly with a lower chance of being corrupted. Depending on the actual implementation, the JFS may only keep track of stored metadata, which results in improved performance at the expense of increased possibility for data corruption.
What Is Journaling File System and Its Advantages/Disadvantages

The Working of a Journal File System In Brief

Here is a diagram taken from Architecture and Design of Linux Storage Stack by Muhammad Umer Page 57

According to the Chapter 3 of the book,

From the diagram, any changes made to the filesystem are written sequentially to a journal, also called a transaction. Once a transaction is written to a journal, it is written to an appropriate location on a disk. In the case of a system crash, the filesystem replays the journal to see whether any transaction is incomplete. When the transaction has been written to its on-disk location, it is removed from the Journal.

It is interesting to note that either the metadata or actual data is first written to the data. Either way, once written to the filesystem, the transaction is removed from the journal. The size of the journal can be a few megabytes.

Benefits of Journal File System and Impact on Performance

Besides making the Filesystem more reliable and preserving its structure in system crashes and hardware failures, the burning question is whether it will impact performance?

Generally, journaling improves performance when it is enabled by having fewer seeks to the physical disks as data is only when a journal is committed or when the journal fills up. For example, in intense meta-data operations like recursive operations on the directory and its content, journaling improves performance by reducing frequent trips to disks and performing multiple updates as a single unit of work.

March 16, 2024 by kittycool only

SSH Error – Receive Packet Type 51 on Rocky Linux 8

If you are having SSH issues and if you turned on high verbosity and the following output is generated

# ssh -vvv XXX.XXX.XXX.XXX

.....
.....
debug1: Offering public key: debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 51
.....
.....
debug2: we did not send a packet, disable method
debug1: No more authentication methods to try.
user1@192.168.0.1: Permission denied (publickey,gssapi-with-mic,password)

According to SSH protocol (RFC 4252), these are the general authentication message codes

SSH_MSG_USERAUTH_REQUEST            50
SSH_MSG_USERAUTH_FAILURE            51
SSH_MSG_USERAUTH_SUCCESS            52
SSH_MSG_USERAUTH_BANNER             53

It means that the authentiation methods mentioned in the “Permission Denied” was not accepted. What are some of the common issues.

Type 1 (Most Common Mistake): Permission Errors on the .ssh Folder and files inside .ssh

For more information, see Encountering SSH “Permission denied, please try again.”

Type 2: Incorrect Configuration Settings on the /etc/ssh/sshd_config
(Assuming you are using Password Authentication) Inside /etc/ssh/sshd_config, you should have something like

PermitRootLogin no
.....
PasswordAuthentication yes
.....
ChallengeResponseAuthentication no
.....
GSSAPIAuthentication yes
GSSAPICleanupCredentials no
.....
UsePAM yes

Type 2: Incorrect Configuration Settings on the /etc/ssh/ssh_config
In Rocky Linux 8, everything should be commented except the last line “Include /etc/ssh/ssh_config.d/*.conf”

References:

March 7, 2024 by kittycool only

GLIBCXX_3.4.25 and cannot support c++17.GLIBCXX Issues

If you have compiled a new appication in an updated GCC-12.3 and still wondering why the Error still surfacing like the message below when you run the binary that was compiled in the GCC-12.3 and not the system GCC:

GLIBCXX version in Wildfly using “strings /usr/lib64/libstdc++.so.6 | grep GLIBCXX” and found the latest version in Cluster is only GLIBCXX_3.4.25 and cannot support c++17.

First step in troubleshooting is really to find out what the binary used for its Shared Object Dependencies.

$ ldd main_fcc
./main_fcc: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by ./main_fcc)
./main_fcc: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by ./main_fcc)
	linux-vdso.so.1 (0x00007fffffbd1000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f4827ea1000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f4827b1f000)
	libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f48278e7000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f48276cf000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f48274af000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f48270ea000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4828236000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f4826ee6000)

For the above example, “libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f4827ea1000)”, the application seems to be pointing back to the system libraries rather than the libraries you are using. You many want to “module load …….” if you are using Module Environment or configure the PATH, LD_LIBRARY_PATH, MANPATH and CPLUS_INCLUDE_PATH of the GCC you . In the end, the Shared Object Dependencies should be pointing to

$ ldd main_fcc
        linux-vdso.so.1 (0x00007fffffbd1000)
	libstdc++.so.6 => /usr/local/gnu/gcc-12.3/libstdc++.so.6 (0x00007f4827ea1000)
	libm.so.6 => /usr/local/gnu/gcc-12.3/libm.so.6 (0x00007f4827b1f000)
	libgomp.so.1 => /usr/local/gnu/gcc-12.3/libgomp.so.1 (0x00007f48278e7000)
	libgcc_s.so.1 => /usr/local/gnu/gcc-12.3/libgcc_s.so.1 (0x00007f48276cf000)
	libpthread.so.0 => /usr/local/gnu/gcc-12.3/libpthread.so.0 (0x00007f48274af000)
	libc.so.6 => /usr/local/gnu/gcc-12.3/libc.so.6 (0x00007f48270ea000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4828236000)
	libdl.so.2 => /usr/local/gnu/gcc-12.3/libdl.so.2 (0x00007f4826ee6000)

We have experience in VS Code Environment, where the Shared Object Dependencies was observed to be pointing back to the System Libraries even when using Environment Modules. Then it is best to put in .bashrc something the one below.

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

# User specific aliases and functions
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
alias ls='ls --color=auto'
alias egrep='egrep --color=auto'
alias fgrep='fgrep --color=auto'
alias grep='grep --color=auto'

# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=

module load gnu/gcc-12.3
module load gnu/gdb-14.2

# User specific aliases and functions

February 25, 2024 by kittycool only

Shortening BASH Commands (Part 1)

I had a casual read on the book “Bash Idioms” by Carl Albing. I scribbled what I learned from what stuck me the most. There are so much more. Please read the book instead.

Lesson 1: “A And B” are true only if both A and B are true…..

Example 1: If the cd command succeeds, then execute the “rm -Rv *.tmp” command

cd tmp && rm -Rv *.tmp

Lesson 2: If “A is true”, “B is not executed” and vice versa.

Example 2: Change Directory, if fail, put out the message that the change directory failed and exit

cd /tmp || { echo "cd to /tmp failed" ; exit ; }

Lesson 3: When do we use the [ ] versus [[ ]]?

I learned that the author advises BASH users to use the [[ ]] unless when avoidable. The Double Bracket helps to avoid confusing edge case behaviours that a single bracket may exhibit. If however, the main goal is portability across various platform to non-bash platforms, single quota may be advisable.

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux

Linux