Ansible Execution Strategies

Ansible execution is versatile enough that we can modify how and when tasks are executed. The settings can be made globally or at the play level. The number of parallel threads is determined by fork. The default is 5

Execution StrategiesExplanationExample
linearThis is the default. The task is executed simultaneously against all the hosts using the forks, and then the next series of hosts until the batch is done before going to the next task At ansible.cfg
….
[defaults]
strategy = linear
fork=10

Or at Play Level
– name: web servers
hosts: webservers
strategy: linear
debugTask execution is like the linear strategy, but controlled by an interactive debug sessionAt ansible.cfg
….
[defaults]
strategy = debug
fork=10

Or at Play Level
– name: web servers
hosts: webservers
strategy: debug
freeAnsible will not wait for other hosts to finish before queueing for more tasks on other hosts. It prevents blocking new tasks for hosts that have already completed.At ansible.cfg
….
[defaults]
strategy = free
fork=10

Or at Play Level
– name: web servers
hosts: webservers
strategy: free

Rolling Updates Strategies.

Fork is based on the hardware limitation. The more powerful your servers are in terms of processing resources, the higher the fork parameters can be set. But there are occasions where the number of parallel executions is determined by software/application restrictions. For example, if you have a rolling updates for your webserver, you can use the “serial” parameters to instruct how many hosts should be updated at a time. In such a way, you can avoid simultaneously avoid simultaneous to avoid downtime.

Execution StrategiesExplanationExample
serialExecute all the hosts in the same batch before moving to the next batch

You can use an absolute number or a percentage
– name: Serial Example
hosts: webservers
serial: “50%”

tasks:
– name: First tasks

References:

  • Red Hat Certified Engineer (RHCE) Ansible Automation Study Guide (Alexc Soto Bueno)

Ansible-Playbook Commonly Used Optional Arguments

These are commonly used Ansible-Playbook Optional Arguments

FlagDescription
-i, –inventorySpecify inventory host path or comma-seperated host list
-b, –becomeRun operations with become (not imply password prompting). Uses existing privilege escalation like sudo, dzdo, runas etc
-K –ask-become-passAsk for the privilege escalation password
–become-method <become-method>Privilege escalation method to use. Default is sudo.
–become-password-file <BECOME_PASSWORD_FILE>Privilege escalation password file
–become-user <BECOME_USER>Run the operation as this user. Default is root
-f <FORKS>, –fSpecify the number of parallel processes to use. Default is 5
-e, –extra-varsAdditional variables as key=value. When specifying a filecontaining a set of variables, prepend the file with @
-t <Tags> –tags <TAGS>Only run plays and tasks tagged with these values
–ask-vault-password, –ask-vault-passAsk foir vault password

References:

  • Red Hat Certified Engineer (RHCE) Ansible Automation Study Guide (Alexc Soto Bueno)

Running Nvidia Drivers and Toolkit runtime file and encountering ORC metadata for CONFIG_UNWINDER_ORC=y on Rocky Linux 8

After downloading the Nvidia Drivers and Toolkits

Make sure you select the corresponding CUDA Toolkit and the CUDA Drivers. The CUDA-Toolkits already have the Drivers as part of the Package

When you run the runtime file, you may encounter this error in the logs

**Makefile:974: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel". Stop.**

To resolve the issue, you have to install the following

dnf install elfutils-libelf-devel

Installing Nvidia DOCA OFED Documentation from Nvidia for Rocky Linux

Taken from Installing Nvidia DOCA OFED. Do read the documentation for more information. Other relevan documentation will include

Quick Reference

Installation Profiles

DOCA-Host ProfileDescription
doca-ofedAllows you to install the same drivers and tools of MLNX_OFED using the DOCA-Host package, but without other DOCA functionality.
doca-networkIntended for users who want to use only the networking functionality of the DOCA-Host package.
doca-allIntended for users who want to use the full extent of DOCA drivers and libraries, the full DOCA-Host installation.
# Remove the installed DOCA OFED software from the host.
for f in $(rpm -qa | grep -i doca ) ; do sudo yum -y remove $f; done

# Remove the installed MLNC_OFED software.
sudo /usr/sbin/ofed_uninstall.sh --force

sudo dnf autoremove
sudo dnf clean all -y
sudo dnf makecache -y

Download and Install NVidia RPM GPG Key

sudo wget http://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox-SHA256
sudo rpm --import RPM-GPG-KEY-Mellanox-SHA256

DOCA-OFED

At /etc/yum.repos.d/

touch /etc/yum.repos.d/doca.repo

Inside /etc/yum.repos.d/doca.repo, include the information

[doca]
name=DOCA Online Repo
baseurl=https://linux.mellanox.com/public/repo/doca/3.2.1/rhel8/x86_64/
enabled=1
gpgcheck=0

Save and Exit

Install DOCA-OFED

dnf install -y doca-ofed

Validating that OFED and ROCEV2 are working

One of the fastest commands is to use ibstat

CA 'mlx5_0'
	CA type: MT4127
	Number of ports: 1
	Firmware version: 26.43.2026
	Hardware version: 0
	Node GUID: 0x5000e6030073b514
	System image GUID: 0x5000e6030073b514
	Port 1:
		State: Down
		Physical state: Disabled
		Rate: 40
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x.....
		Port GUID: 0x......
		Link layer: Ethernet
CA 'mlx5_1'
	CA type: MT4127
	Number of ports: 1
	Firmware version: 26.43.2026
	Hardware version: 0
	Node GUID: 0x.....
	System image GUID: 0x.....
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 25
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x.......
		Port GUID: 0x.....
		Link layer: Ethernet

You can use the following information to check further. Installing RoCE using Mellanox (Nvidia) OFED package

Checking Assigned Logical Name to Hardware Brand

Method 1: Using Ethernet and lspci

[root@hpc-node1 ~]# ethtool -i ens3f1np1
driver: mlx5_core
version: 25.10-1.7.1
firmware-version: 26.43.2026 (MT_0000000575)
expansion-rom-version: 
bus-info: 0000:5d:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[root@hpc-node1 ~]lspci -s 0000:5d:00.1
0000:5d:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]

Method 2: Using Ishw

[root@hpc-wfly-i022 ~]# lshw -C network
.....
.....
*-network:1
       description: Ethernet interface
       product: MT2894 Family [ConnectX-6 Lx]
       vendor: Mellanox Technologies
       physical id: 0.1
       bus info: pci@0000:5d:00.1
       logical name: ens3f1
       version: 00
       serial: 50:00:e6:73:b5:15
       capacity: 10Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical fibre 1000bt-fd 10000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=mlx5_core driverversion=4.18.0-553.54.1.el8_10.x86_64 firmware=26.43.2026 (MT_0000000575) latency=0 link=no multicast=yes port=fibre
       resources: iomemory:1f3f0-1f3ef irq:17 memory:1f3ffa000000-1f3ffbffffff memory:b5f00000-b5ffffff
.....
.....

Understanding basic nmcli in Rocky Linux 9

In Rocky Linux 9, the nmcli command-line tool (Network Manager Command Line) replaces the traditional ifcfg files that we have been using since Rocky Linux 8. If you can Google “Why nmcli is replacing the ifcfg”, you will find a comprehensive list of key reasons why the transition took place. One thing that I like best is this particular answer

nmcli commands are designed to be easily automated and scripted (e.g., using Ansible), offering better control and error checking (syntax validation) compared to generating flat text files through scripts.

Usage 1a: List the NetworkManager connection profiles

# nmcli con
NAME   UUID                                  TYPE      DEVICE 
ens33  xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  ethernet  ens33  
lo     yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy  loopback  lo 

Usage 1b: List the Network Devices and their status

# nmcli dev
DEVICE  TYPE      STATE                   CONNECTION 
ens33   ethernet  connected               ens33      
lo      loopback  connected (externally)  lo        

Usage 2a: Disable the connection of ens33

# nmcli con down ens33
Connection 'ens33' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2)

Usage 2b: Enable the connection of ens33

# nmcli con up ens33
Connection 'ens33' successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2)

Usage 2c: Show Connection Details

# nmcli con show ens33
[root@hpc-wfly-rl9 ~]# nmcli con show ens33
connection.id:                          ens33
connection.uuid:                        817c4ac5-49f4-3752-9a16-9d7460bed1c9
connection.stable-id:                   --
connection.type:                        802-3-ethernet
connection.interface-name:              ens33
connection.autoconnect:                 yes
connection.autoconnect-priority:        -999
connection.autoconnect-retries:         -1 (default)
connection.multi-connect:               0 (default)
connection.auth-retries:                -1
connection.timestamp:                   1763952141
connection.permissions:                 --
connection.zone:                        --
connection.controller:                  --
connection.master:                      --
connection.slave-type:                  --
connection.port-type:                   --
connection.autoconnect-slaves:          -1 (default)
connection.autoconnect-ports:           -1 (default)
connection.down-on-poweroff:            -1 (default)
connection.secondaries:                 --
connection.gateway-ping-timeout:        0
connection.ip-ping-timeout:             0
connection.ip-ping-addresses:           --
connection.ip-ping-addresses-require-all:-1 (default)
connection.metered:                     unknown
connection.lldp:                        default
.....
.....

Usage 3: Set the static IP Address of the Ethernet Connection

# nmcli con mod ens33 ipv4.method manual ipv4.address 10.10.1.2/24 ipv4.gateway 10.10.1.1
# nmcli con up ens33

Usage 4a: Using conn to update DNS (replace manual scripting of /etc/resolv.conf)

# nmcli con mod ens33 ipv4.dns '8.8.8.8,8.8.8.4' 
# nmcli con show |grep dns
# nmcli con up ens33

At /etc/resolv.conf, you will notice

# Generated by NetworkManager
search myown.domain.com
nameserver 8.8.8.8
nameserver 8.8.8.4

Usage 4b: Using nmcli to update domain search (replace manual scripting of /etc/resolv.conf)

# nmcli con mod ens33 ipv4.dns-search 'myown.domain.com'
# nmcli con up ens33

Usage 5a: Disable IPv6

# nmcli con mod ens33 ipv6.method "disabled"
# nmcli con up ens33
.....
....
ipv6.method:                            disabled
ipv6.dns:                               --
ipv6.dns-search:                        --
ipv6.dns-options:                       --
ipv6.dns-priority:                      0
ipv6.addresses:                         --
....
.....

Display the IP settings of the device. If there is no inet6 entry is displayed, IPv6 is disabled on the device.

# ip address show ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
altname enp2s1
inet 192.168.x.x/19 brd 192.168.x.x scope global noprefixroute ens33
    valid_lft forever preferred_lft forever

References:

driverExceptions.EnvrionmentFileReadError on ABAQUS 2025 Hotfix4 foir Rocky Linux 8

If you encounter the Issue when you type “abaqus cae”

driverExceptions.EnvironmentFileReadError: /...../......../abaqus_v6.env
File "SMAPyaModules/SMAPyaDriverPy.m/src/driverUtilsCae.py", line 44, in executeOnCaeGraphicsStartup
File "SMAPyaModules/SMAPyaDriverPy.m/src/driverUtilsCae.py", line 28, in callStartupMethod
File "SMAPylModules/SMAPylDriverPy.m/src/driverEnv.py", line 878, in read
File "SMAPylModules/SMAPylDriverPy.m/src/driverEnv.py", line 770, in _updateEnvFromFile
File "SMAPylModules/SMAPylDriverPy.m/src/driverEnv.py", line 672, in _readEnvironmentFile
File "SMAPylModules/SMAPylDriverPy.m/src/driverEnv.py", line 391, in envRunFile

Abaqus Error: Abaqus/CAE Kernel exited with an error.

The Solution is super easy. Just do the following:

export LANG=en_US.UTF-8
abaqus cae