Optimizing Ansible Performance: Implementing Parallelism with Forks

Ansible’s parallel processes are known as forks, and the default number of forks is five. In other words, Ansible attempts to run automation jobs on 5 hosts simultaneously. The more forks you set, the more resources are used on the Ansible control node.

How do you implement? Just edit the ansible.cfg file. Look for the “forks” parameters. You can use the command “ansible-config view” to view ansible.cfg output. 

[defaults]
inventory = inventory
private_key_file = ~/.ssh/xxxxxx
become = true
become_user = root
timeout = 30
forks = 10
log_path = /var/log/ansible.log
display_skipped_hosts=yes
display_ok_hosts=yes
display_failed_stderr=yes
show_custom_stats=yes
verbosity = 0

References and Other Useful Information:

  1. How to implement parallelism and rolling updates in Ansible
  2. Ansible Update Management: Serial Execution and Percentage Indicators

Ansible Delayed Error Handling with Rescue Blocks: Chrony Setup Example

A recap there are 2 main use of Blocks in Ansible. The first write-up can be found at Grouping Tasks with Block in Ansible

  1. Apply conditional logic to all the tasks within the block. In such a way, the logic only need to be declared once
  2. Apply Error handling especially when recovering from an error condition.

Today, we will deal with Point 2 in this blog entry

According to Ansible Documentation found at https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_blocks.html

Rescue blocks specify tasks to run when an earlier task in a block fails. This approach is similar to exception handling in many programming languages. Ansible only runs rescue blocks after a task returns a ‘failed’ state. Bad task definitions and unreachable hosts will not trigger the rescue block.

Here is my simple example for implementation

- name: Check current Timezone
  command: timedatectl show --property=Timezone --value
  register: timezone_output
  changed_when: false

- name: Configure Timezone to Asia/Singapore
  command: timedatectl set-timezone Asia/Singapore
  when: timezone_output.stdout != "Asia/Singapore"

- name: Install and Configure Chrony Service Block
  block:
    - name: Install Chrony package
      dnf:
        name: chrony
        state: present

    - name: Configure Chrony servers
      lineinfile:
        path: /etc/chrony.conf
        line: "server sg.pool.ntp.org iburst"
        insertafter: '^#.*server 3.centos.pool.ntp.org iburst'
        state: present

    - name: Enable Chrony service
      service:
        name: chronyd
        state: started
        enabled: yes
  rescue:
    - name: Print when Errors
      debug:
        msg: 'Something failed at Chrony Setup'
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Efficient Task Grouping with Ansible: Timezone Configuration Example

Ansible allows us to logically group a set of tasks together together, and…..

  1. Apply conditional logic to all the tasks within the block. In such a way, the logic only need to be declared once
  2. Apply Error handling especially when recovering from an error condition.

We will deal with Point 1 in this blog entry.

Point 1: Conditional Logic

- name: Check current Timezone
  command: timedatectl show --property=Timezone --value
  register: timezone_output
  changed_when: false

- name: Configure Timezone to Asia/Singapore
  command: timedatectl set-timezone Asia/Singapore
  when: timezone_output.stdout != "Asia/Singapore"

- name: Install and Configure Chrony Service Block
  block:
    - name: Install Chrony package
      dnf:
        name: chrony
        state: present

    - name: Configure Chrony servers
      lineinfile:
        path: /etc/chrony.conf
        line: "server sg.pool.ntp.org iburst"
        insertafter: '^#.*server 3.centos.pool.ntp.org iburst'
        state: present

    - name: Enable Chrony service
      service:
        name: chronyd
        state: started
        enabled: yes
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Reference:

Blocks

Automating Security Patch Logs and MS-Team Notifications with Ansible on Rocky Linux 8

If you have read the blog entry Using Ansible to automate Security Patch on Rocky Linux 8, you may want to consider capturing the logs and send notification to MS-Team if you are using that as a Communication Channel. This is a follow-up to that blog.

Please look at Part 1: Using Ansible to automate Security Patch on Rocky Linux 8

Writing logs (Option 1: Ansible Command used if just checking)

Recall that in Option 1: Ansible Command used if just checking, Part 1a & Part 1b, you can consider writing to logs in /var/log/ansible_logs

- name: Create a directory if it does not exist
file:
path: /var/log/ansible_logs
state: directory
mode: '0755'
owner: root
when:
- ansible_os_family == "RedHat"
- ansible_distribution_major_version == "8"



- name: Copy Results to file
ansible.builtin.copy:
content: "{{ register_output_security.results | map(attribute='name') | list }}"
dest: /var/log/ansible_logs/patch-list_{{ansible_date_time.date}}.log
changed_when: false
when:
- ansible_os_family == "RedHat"
- ansible_distribution_major_version == "8"

Notification (Option 1: Ansible Command used if just checking)

You can write to MS Team to provide a short notification to let the Engineers knows that the logs has been written to /var/log/ansible_logs

- name: Send a notification to MS-Teams that Test Run (No Patching) is completed
run_once: true
uri:
url: "https://xxxxxxx.webhook.office.com/webhookb2/xxxxxxxxxxxxxxxxxxxxxxxxx"
method: POST
body_format: json
body:
title: "Test Patch Run on {{ansible_date_time.date}}"
text: "Test Run only. System has not been Patched Yet. Logs saved at: /var/log/ansible_logs/patch-list_{{ansible_date_time.date}}.log"
when:
- register_update_success is defined
- ext_permit_flag == "no"

Writing to MS-Team to capture the success Or failure of the Update (Option 2: Ansible Command used when ready for Patching)

- name: Send a notification to MS-Teams Channel if Upgrade failed
run_once: true
uri:
url: "https://xxxxx.webhook.office.com/webhookb2/xxxxxx"
method: POST
body_format: json
body:
title: "Patch Run on {{ansible_date_time.date}}"
text: "Patch Update has Failed"
when:
- register_update_success is not defined
- ext_permit_flag == "yes"



- name: Send a notification to MS-Teams Channel if Upgrade failed
run_once: true
uri:
url: "https://entuedu.webhook.office.com/webhookb2/xxxxxx"
method: POST
body_format: json
body:
title: "Patch Run on {{ansible_date_time.date}}"
text: "Patch Update is Successful. Logs saved at: /var/log/ansible_logs/patch-list_{{ansible_date_time.date}}.log"
when:
- register_update_success is defined
- ext_permit_flag == "yes"

Automating Linux Patching with Ansible: A Simple Guide

If you intend to use Ansible to patch the Server, you may need to use an external variable to decide whether you wish to take a look at the list or actually patch the OS. It consists of 3 parts.

Option 1: Ansible Command used if just checking

$ ansible-playbook security.yml --extra-vars "ext_permit_flag=no"

Part 1a: Get the List of Packages from DNF to be upgraded ONLY using the External Permit Flag = “no”

- name: Get the list of Packages from DNF to be upgraded (ext_permit_flag == "no")
dnf:
security: yes
bugfix: false
state: latest
update_cache: yes
list: updates
exclude: 'kernel*'
register: register_output_security
when:
- ansible_os_family == "RedHat"
- ansible_distribution_major_version == "8"
- ext_permit_flag == "no"

Part 1b: Report the List of Packages from DNF to be upgraded ONLY using the External Permit Flag = “no”

- name: Report the List of Packages from DNF to be upgraded ( ext_permit_flag == no")
debug:
msg: "{{ register_output_security.results | map(attribute='name') | list }}"
when:
- ansible_os_family == "RedHat"
- ansible_distribution_major_version == "8"
- ext_permit_flag == "no"

Option 2: Ansible Command used when ready for Patching

$ ansible-playbook security.yml --extra-vars "ext_permit_flag=yes"

Part 2: Patch all the packages except Kernel

- name: Patch all the packages except Kernel

dnf:
name: '*'
security: yes
bugfix: false
state: latest
update_cache: yes
update_only: no
exclude: 'kernel*'
register: register_update_success
when:
- ansible_os_family == "RedHat"
- ansible_distribution_major_version == "8"
- ext_permit_flag == "yes"

- name: Print Errors if upgrade failed
debug:
msg: "Patch Update Failed"
when: register_update_success is not defined

Reference:

  1. ansible.builtin.dnf module – Manages packages with the dnf package manager
  2. Automating Linux patching with Ansible

Optimizing Firewalld Configuration with Ansible’s with_items Parameter

Ansible is great for configuring host-based firewall like Firewalld. One thing you will note is that we are using with_items parameter a lot and it is very useful in this case since we have a number of parameters within items.

- name: FirewallD Rules (Ports)
  firewalld:
    permanent: yes
    immediate: yes
    port: "{{item.port}}/{{item.proto}}"
    state: "{{item.state}}"
    zone: "{{item.zone}}"
  with_items:
    - {port: "80", proto: "tcp", state: "enabled", zone: "public" }
    - {port: "80", proto: "udp", state: "enabled", zone: "public" }
    - {port: "443", proto: "tcp", state: "disabled", zone: "public" }
    - {port: "443", proto: "udp", state: "disabled", zone: "public" }


- name: FirewallD Rules (Services)
  firewalld:
    permanent: yes
    immediate: yes
    service: "{{item.service}}"
    state: "{{item.state}}"
    zone: "{{item.zone}}"
  with_items:
    - {service: "cockpit", state: "disabled", zone: "public" }

- name: Turn on Firewalld.service on Compute Nodes
  systemd:
    name: firewalld
    state: started
    enabled: yes
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

References:

Mounting and Unmounting NFS File Systems Using Ansible: Essential Tutorial

You can use Ansible to automate the configuration of NFS Client Settings

1. Mount an NFS File system, and configure in /etc/fstab

Use state: mounted

- name: Mount NFS Share nfs-server:/usr/local
  ansible.posix.mount:
      src: nfs-server:/usr_local
      path: /usr/local
      fstype: nfs
      opts: rw,nconnect=16,nfsvers=3,tcp,hard,intr,timeo=600,retrans=2,rsize=524288,wsize=524288
      state: mounted

2. Unmount an NFS File System, but not leave /etc/fstab unmodified

Use state: unmounted

- name: Mount NFS Share nfs-server:/usr/local
  ansible.posix.mount:
      src: nfs-server:/usr_local
      path: /usr/local
      fstype: nfs
      opts: rw,nconnect=16,nfsvers=3,tcp,hard,intr,timeo=600,retrans=2,rsize=524288,wsize=524288
      state: unmounted

3. Umount an NFS File System, and remove settings from /etc/fstab

Use state: absent

- name: Mount NFS Share nfs-server:/usr/local
  ansible.posix.mount:
      src: nfs-server:/usr_local
      path: /usr/local
      fstype: nfs
      opts: rw,nconnect=16,nfsvers=3,tcp,hard,intr,timeo=600,retrans=2,rsize=524288,wsize=524288
      state: absent

4. Remount an NFS System, without chaning /etc/fstab

Use state: remounted

- name: Mount NFS Share nfs-server:/usr/local
  ansible.posix.mount:
      src: nfs-server:/usr_local
      path: /usr/local
      fstype: nfs
      opts: rw,nconnect=16,nfsvers=3,tcp,hard,intr,timeo=600,retrans=2,rsize=524288,wsize=524288
      state: remount

References:

  1. ansible.posix.mount module – Control active and configured mount points
  2. Mounting and un-mounting a volume in ansible:

Disabling Avahi-Daemon on CentOS 7

I was having a bit of difficult of turning Avahi-Daemon which is also called mDNS since I do not need the Service. When I use the command

# systemctl status avahi-daemon
● avahi-daemon.service - Avahi mDNS/DNS-SD Stack
   Loaded: loaded (/usr/lib/systemd/system/avahi-daemon.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2023-08-28 08:46:26 +08; 14h ago
 Main PID: 36457 (avahi-daemon)
   Status: "avahi-daemon 0.6.31 starting up."
    Tasks: 2
   Memory: 676.0K
   CGroup: /system.slice/avahi-daemon.service
           ├─36457 avahi-daemon: running [hpc-r001.local]
           └─36494 avahi-daemon: chroot helper
.....
.....
.....
 

Unable to Stop ???

I tried to stop it, but the daemon did not stop….. Hmmmmm

# systemctl stop avahi-daemon
Warning: Stopping avahi-daemon.service, but it can still be activated by:
  avahi-daemon.socket
[root@hpc-r001 ~]# systemctl status avahi-daemon
● avahi-daemon.service - Avahi mDNS/DNS-SD Stack
   Loaded: loaded (/usr/lib/systemd/system/avahi-daemon.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2023-08-28 23:11:54 +08; 10s ago
 Main PID: 372559 (avahi-daemon)
   Status: "avahi-daemon 0.6.31 starting up."
    Tasks: 2
   Memory: 704.0K
   CGroup: /system.slice/avahi-daemon.service
           ├─372559 avahi-daemon: running [hpc-r001.local]
           └─372563 avahi-daemon: chroot helper

Unable to Disable ???

I tried to disable as well. But…… still alive?

# systemctl disable avahi-daemon
Removed symlink /etc/systemd/system/multi-user.target.wants/avahi-daemon.service.
Removed symlink /etc/systemd/system/dbus-org.freedesktop.Avahi.service.
Removed symlink /etc/systemd/system/sockets.target.wants/avahi-daemon.socket.
[root@hpc-r001 ~]# systemctl status avahi-daemon
● avahi-daemon.service - Avahi mDNS/DNS-SD Stack
   Loaded: loaded (/usr/lib/systemd/system/avahi-daemon.service; disabled; vendor preset: enabled)
   Active: active (running) since Mon 2023-08-28 23:12:25 +08; 2min 32s ago
 Main PID: 372707 (avahi-daemon)
   Status: "avahi-daemon 0.6.31 starting up."
   CGroup: /system.slice/avahi-daemon.service
           ├─372707 avahi-daemon: running [hpc-r001.local]
           └─372709 avahi-daemon: chroot helper

Finally…… Mask, Disable then stop.

To prevent a service from running you need to “mask” it first

# systemctl mask avahi-daemon
Created symlink from /etc/systemd/system/avahi-daemon.service to /dev/null.
# systemctl disable avahi-daemon
# systemctl stop avahi-daemon

# systemctl status avahi-daemon
● avahi-daemon.service
   Loaded: masked (/dev/null; bad)
   Active: inactive (dead) since Mon 2023-08-28 23:15:42 +08; 20min ago
 Main PID: 372707 (code=exited, status=0/SUCCESS)

Having kernel hung_task_timeout_secs Issues

There is a good article Linux Kernel panic issue: How to fix hung_task_timeout_secs and blocked for more than 120 seconds problem which provides an explanation and solution to kernel hung_task_timeout_secs Issues.

By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds. As IO subsystem responds slowly and more requests are served, System Memory gets filled up resulting in the above error, thus serving HTTP requests.

Linux Kernel panic issue: How to fix hung_task_timeout_secs and blocked for more than 120 seconds problem

Resolution

Change vm.dirty_ratio and vm.dirty_backgroud_ratio

# sysctl -w vm.dirty_ratio=10
# sysctl -w vm.dirty_background_ratio=5
# sysctl -p

If you wish to make it permanent, add the 2 lines to /etc/sysctl.conf

vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

Automating the Linux Client Server for Centrify and 2FA on Rocky Linux 8

The whole manual setup including those on the Active Directory can be found at Preparing a Linux Client Server for Centrify and 2FA for CentOS-7

If you just want to automate the Linux portion, here is something you may wish to consider.

Update the sshd_config Templates (The most important portion is that the “PasswordAuthentication no” and “ChallengeResponseAuthentication yes” is present. The whole sshd_config template is too large for me to put into the blog.

.....
.....
# To disable tunneled clear text passwords, change to no here!
#PasswordAuthentication yes
#PermitEmptyPasswords no
PasswordAuthentication no

# Change to no to disable s/key passwords
#ChallengeResponseAuthentication yes
ChallengeResponseAuthentication yes
.....
.....
- name: Generate /etc/ssh/sshd_config from /etc/ssh/sshd_config.j2 template
  template:
      src: ../templates/sshd_config.j2
      dest: /etc/ssh/sshd_config
      owner: root
      group: root
      mode: 0600
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

- name: Restart SSH Service
  systemd:
    name: sshd
    state: restarted
    enabled: yes
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
  changed_when: false

Here is Centrify_2FA.yml to insert the IWaTrustRoot.pem certificate

- name: Copy IwaTrustRoot.pem to /etc/pki/ca-trust/source/anchors/
  template:
      src: /usr/local/software/certificate/IwaTrustRoot.pem
      dest: /etc/pki/ca-trust/source/anchors/
      owner: root
      group: root
      mode: 0600
  become: true
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

- name: Copy IwaTrustRoot.pem to /var/centrify/net/certs
  template:
      src: /usr/local/software/certificate/IwaTrustRoot.pem
      dest: /var/centrify/net/certs
      owner: root
      group: root
      mode: 0600
  become: true
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Restart the CentrifyDC and do a Flush so that the AD is updated.

- name: CentrifyDC Restart
  ansible.builtin.shell: "/usr/share/centrifydc/bin/centrifydc restart"
  register: centrifydc_status
  changed_when: false

- name: Active Directory Flush
  ansible.builtin.shell: "adflush -f"
  register: flush_status
  changed_when: false

- name: Centrify Service Restarted
  debug:
    msg: "Load Average: {{ centrifydc_status.stdout }}"