May 12, 2024 by kittycool only

Optimizing Ansible Performance: Implementing Parallelism with Forks

Ansible’s parallel processes are known as forks, and the default number of forks is five. In other words, Ansible attempts to run automation jobs on 5 hosts simultaneously. The more forks you set, the more resources are used on the Ansible control node.

How do you implement? Just edit the ansible.cfg file. Look for the “forks” parameters. You can use the command “ansible-config view” to view ansible.cfg output.

[defaults]
inventory = inventory
private_key_file = ~/.ssh/xxxxxx
become = true
become_user = root
timeout = 30
forks = 10
log_path = /var/log/ansible.log
display_skipped_hosts=yes
display_ok_hosts=yes
display_failed_stderr=yes
show_custom_stats=yes
verbosity = 0

References and Other Useful Information:

May 12, 2024 by kittycool only

Ansible Delayed Error Handling with Rescue Blocks: Chrony Setup Example

A recap there are 2 main use of Blocks in Ansible. The first write-up can be found at Grouping Tasks with Block in Ansible

Apply conditional logic to all the tasks within the block. In such a way, the logic only need to be declared once
Apply Error handling especially when recovering from an error condition.

Today, we will deal with Point 2 in this blog entry

According to Ansible Documentation found at https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_blocks.html

Rescue blocks specify tasks to run when an earlier task in a block fails. This approach is similar to exception handling in many programming languages. Ansible only runs rescue blocks after a task returns a ‘failed’ state. Bad task definitions and unreachable hosts will not trigger the rescue block.

Here is my simple example for implementation

- name: Check current Timezone
  command: timedatectl show --property=Timezone --value
  register: timezone_output
  changed_when: false

- name: Configure Timezone to Asia/Singapore
  command: timedatectl set-timezone Asia/Singapore
  when: timezone_output.stdout != "Asia/Singapore"

- name: Install and Configure Chrony Service Block
  block:
    - name: Install Chrony package
      dnf:
        name: chrony
        state: present

    - name: Configure Chrony servers
      lineinfile:
        path: /etc/chrony.conf
        line: "server sg.pool.ntp.org iburst"
        insertafter: '^#.*server 3.centos.pool.ntp.org iburst'
        state: present

    - name: Enable Chrony service
      service:
        name: chronyd
        state: started
        enabled: yes
  rescue:
    - name: Print when Errors
      debug:
        msg: 'Something failed at Chrony Setup'
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

May 8, 2024 by kittycool only

Efficient Task Grouping with Ansible: Timezone Configuration Example

Ansible allows us to logically group a set of tasks together together, and…..

Apply conditional logic to all the tasks within the block. In such a way, the logic only need to be declared once
Apply Error handling especially when recovering from an error condition.

We will deal with Point 1 in this blog entry.

Point 1: Conditional Logic

- name: Check current Timezone
  command: timedatectl show --property=Timezone --value
  register: timezone_output
  changed_when: false

- name: Configure Timezone to Asia/Singapore
  command: timedatectl set-timezone Asia/Singapore
  when: timezone_output.stdout != "Asia/Singapore"

- name: Install and Configure Chrony Service Block
  block:
    - name: Install Chrony package
      dnf:
        name: chrony
        state: present

    - name: Configure Chrony servers
      lineinfile:
        path: /etc/chrony.conf
        line: "server sg.pool.ntp.org iburst"
        insertafter: '^#.*server 3.centos.pool.ntp.org iburst'
        state: present

    - name: Enable Chrony service
      service:
        name: chronyd
        state: started
        enabled: yes
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Reference:

Blocks

December 30, 2023 by kittycool only

Automating Security Patch Logs and MS-Team Notifications with Ansible on Rocky Linux 8

If you have read the blog entry Using Ansible to automate Security Patch on Rocky Linux 8, you may want to consider capturing the logs and send notification to MS-Team if you are using that as a Communication Channel. This is a follow-up to that blog.

Please look at Part 1: Using Ansible to automate Security Patch on Rocky Linux 8

Writing logs (Option 1: Ansible Command used if just checking)

Recall that in Option 1: Ansible Command used if just checking, Part 1a & Part 1b, you can consider writing to logs in /var/log/ansible_logs

- name: Create a directory if it does not exist
  file:
    path: /var/log/ansible_logs
    state: directory
    mode: '0755'
    owner: root
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"



- name: Copy Results to file
  ansible.builtin.copy:
    content: "{{ register_output_security.results | map(attribute='name') | list }}"
    dest: /var/log/ansible_logs/patch-list_{{ansible_date_time.date}}.log
  changed_when: false
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Notification (Option 1: Ansible Command used if just checking)

You can write to MS Team to provide a short notification to let the Engineers knows that the logs has been written to /var/log/ansible_logs

- name: Send a notification to MS-Teams that Test Run (No Patching) is completed
  run_once: true
  uri:
    url: "https://xxxxxxx.webhook.office.com/webhookb2/xxxxxxxxxxxxxxxxxxxxxxxxx"
    method: POST
    body_format: json
    body:
      title: "Test Patch Run on {{ansible_date_time.date}}"
      text: "Test Run only. System has not been Patched Yet. Logs saved at: /var/log/ansible_logs/patch-list_{{ansible_date_time.date}}.log"
  when:
    - register_update_success is defined
    - ext_permit_flag == "no"

Writing to MS-Team to capture the success Or failure of the Update (Option 2: Ansible Command used when ready for Patching)

- name: Send a notification to MS-Teams Channel if Upgrade failed
  run_once: true
  uri:
    url: "https://xxxxx.webhook.office.com/webhookb2/xxxxxx"
    method: POST
    body_format: json
    body:
      title: "Patch Run on {{ansible_date_time.date}}"
      text: "Patch Update has Failed"
  when:
    - register_update_success is not defined
    - ext_permit_flag == "yes"



- name: Send a notification to MS-Teams Channel if Upgrade failed
  run_once: true
  uri:
    url: "https://entuedu.webhook.office.com/webhookb2/xxxxxx"
    method: POST
    body_format: json
    body:
      title: "Patch Run on {{ansible_date_time.date}}"
      text: "Patch Update is Successful. Logs saved at: /var/log/ansible_logs/patch-list_{{ansible_date_time.date}}.log"
  when:
    - register_update_success is defined
    - ext_permit_flag == "yes"

December 25, 2023 by kittycool only

Automating Linux Patching with Ansible: A Simple Guide

If you intend to use Ansible to patch the Server, you may need to use an external variable to decide whether you wish to take a look at the list or actually patch the OS. It consists of 3 parts.

Option 1: Ansible Command used if just checking

$ ansible-playbook security.yml --extra-vars "ext_permit_flag=no"

Part 1a: Get the List of Packages from DNF to be upgraded ONLY using the External Permit Flag = “no”

- name: Get the list of Packages from DNF to be upgraded (ext_permit_flag == "no")
  dnf:
    security: yes
    bugfix: false
    state: latest
    update_cache: yes
    list: updates
    exclude: 'kernel*'
  register: register_output_security
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
    - ext_permit_flag == "no"

Part 1b: Report the List of Packages from DNF to be upgraded ONLY using the External Permit Flag = “no”

- name: Report the List of Packages from DNF to be upgraded ( ext_permit_flag == no")
  debug:
    msg: "{{ register_output_security.results | map(attribute='name') | list }}"
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
    - ext_permit_flag == "no"

Option 2: Ansible Command used when ready for Patching

$ ansible-playbook security.yml --extra-vars "ext_permit_flag=yes"

Part 2: Patch all the packages except Kernel

- name: Patch all the packages except Kernel

  dnf:
    name: '*'
    security: yes
    bugfix: false
    state: latest
    update_cache: yes
    update_only: no
    exclude: 'kernel*'
  register: register_update_success
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
    - ext_permit_flag == "yes"

- name: Print Errors if upgrade failed
  debug:
    msg: "Patch Update Failed"
  when: register_update_success is not defined

Reference:

November 22, 2023 by kittycool only

Optimizing Firewalld Configuration with Ansible’s with_items Parameter

Ansible is great for configuring host-based firewall like Firewalld. One thing you will note is that we are using with_items parameter a lot and it is very useful in this case since we have a number of parameters within items.

- name: FirewallD Rules (Ports)
  firewalld:
    permanent: yes
    immediate: yes
    port: "{{item.port}}/{{item.proto}}"
    state: "{{item.state}}"
    zone: "{{item.zone}}"
  with_items:
    - {port: "80", proto: "tcp", state: "enabled", zone: "public" }
    - {port: "80", proto: "udp", state: "enabled", zone: "public" }
    - {port: "443", proto: "tcp", state: "disabled", zone: "public" }
    - {port: "443", proto: "udp", state: "disabled", zone: "public" }


- name: FirewallD Rules (Services)
  firewalld:
    permanent: yes
    immediate: yes
    service: "{{item.service}}"
    state: "{{item.state}}"
    zone: "{{item.zone}}"
  with_items:
    - {service: "cockpit", state: "disabled", zone: "public" }

- name: Turn on Firewalld.service on Compute Nodes
  systemd:
    name: firewalld
    state: started
    enabled: yes
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

References:

October 8, 2023 by kittycool only

Mounting and Unmounting NFS File Systems Using Ansible: Essential Tutorial

You can use Ansible to automate the configuration of NFS Client Settings

1. Mount an NFS File system, and configure in /etc/fstab

Use state: mounted

- name: Mount NFS Share nfs-server:/usr/local
  ansible.posix.mount:
      src: nfs-server:/usr_local
      path: /usr/local
      fstype: nfs
      opts: rw,nconnect=16,nfsvers=3,tcp,hard,intr,timeo=600,retrans=2,rsize=524288,wsize=524288
      state: mounted

2. Unmount an NFS File System, but not leave /etc/fstab unmodified

Use state: unmounted

- name: Mount NFS Share nfs-server:/usr/local
  ansible.posix.mount:
      src: nfs-server:/usr_local
      path: /usr/local
      fstype: nfs
      opts: rw,nconnect=16,nfsvers=3,tcp,hard,intr,timeo=600,retrans=2,rsize=524288,wsize=524288
      state: unmounted

3. Umount an NFS File System, and remove settings from /etc/fstab

Use state: absent

- name: Mount NFS Share nfs-server:/usr/local
  ansible.posix.mount:
      src: nfs-server:/usr_local
      path: /usr/local
      fstype: nfs
      opts: rw,nconnect=16,nfsvers=3,tcp,hard,intr,timeo=600,retrans=2,rsize=524288,wsize=524288
      state: absent

4. Remount an NFS System, without chaning /etc/fstab

Use state: remounted

- name: Mount NFS Share nfs-server:/usr/local
  ansible.posix.mount:
      src: nfs-server:/usr_local
      path: /usr/local
      fstype: nfs
      opts: rw,nconnect=16,nfsvers=3,tcp,hard,intr,timeo=600,retrans=2,rsize=524288,wsize=524288
      state: remount

References:

August 11, 2023 by kittycool only

Automating the Linux Client Server for Centrify and 2FA on Rocky Linux 8

The whole manual setup including those on the Active Directory can be found at Preparing a Linux Client Server for Centrify and 2FA for CentOS-7

If you just want to automate the Linux portion, here is something you may wish to consider.

Update the sshd_config Templates (The most important portion is that the “PasswordAuthentication no” and “ChallengeResponseAuthentication yes” is present. The whole sshd_config template is too large for me to put into the blog.

.....
.....
# To disable tunneled clear text passwords, change to no here!
#PasswordAuthentication yes
#PermitEmptyPasswords no
PasswordAuthentication no

# Change to no to disable s/key passwords
#ChallengeResponseAuthentication yes
ChallengeResponseAuthentication yes
.....
.....

- name: Generate /etc/ssh/sshd_config from /etc/ssh/sshd_config.j2 template
  template:
      src: ../templates/sshd_config.j2
      dest: /etc/ssh/sshd_config
      owner: root
      group: root
      mode: 0600
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

- name: Restart SSH Service
  systemd:
    name: sshd
    state: restarted
    enabled: yes
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
  changed_when: false

Here is Centrify_2FA.yml to insert the IWaTrustRoot.pem certificate

- name: Copy IwaTrustRoot.pem to /etc/pki/ca-trust/source/anchors/
  template:
      src: /usr/local/software/certificate/IwaTrustRoot.pem
      dest: /etc/pki/ca-trust/source/anchors/
      owner: root
      group: root
      mode: 0600
  become: true
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

- name: Copy IwaTrustRoot.pem to /var/centrify/net/certs
  template:
      src: /usr/local/software/certificate/IwaTrustRoot.pem
      dest: /var/centrify/net/certs
      owner: root
      group: root
      mode: 0600
  become: true
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Restart the CentrifyDC and do a Flush so that the AD is updated.

- name: CentrifyDC Restart
  ansible.builtin.shell: "/usr/share/centrifydc/bin/centrifydc restart"
  register: centrifydc_status
  changed_when: false

- name: Active Directory Flush
  ansible.builtin.shell: "adflush -f"
  register: flush_status
  changed_when: false

- name: Centrify Service Restarted
  debug:
    msg: "Load Average: {{ centrifydc_status.stdout }}"

August 2, 2023 by kittycool only

Installing CUDA with Ansible for Rocky Linux 8

Installation Guide

You can take a look at Nvidia CUDA Installation Guide for more information

Step 1: Get the Nvidia CUDA Repo

You can find the Repo from the Nvidia Download Sites. It should be named cuda_rhel8.repo. Copy it and use it as a template with a j2 extension.

[cuda-rhel8-x86_64]
name=cuda-rhel8-x86_64
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64
enabled=1
gpgcheck=1
gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/D42D0685.pub

Step 2: Use Ansible to Generate the repo from Templates.

The Ansible Script should look like this.

 - name: Generate /etc/yum.repos.d/cuda_rhel8.repo
   template:
    src: ../templates/cuda-rhel8-repo.j2
    dest: /etc/yum.repos.d/cuda_rhel8.repo
    owner: root
    group: root
    mode: 0644
   become: true
   when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Step 3: Install the Kernel-Headers and Kernel-Devel

The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well as whenever the driver is rebuilt.

- name: Install Kernel-Headers and  Kernel-Devel
  dnf:
    name:
        - kernel-devel
        - kernel-headers
    state: present
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Step 4: Disabling Nouveau

To install the Display Driver, the Nouveau drivers must first be disabled. I use a template to disable it. I created a template called blacklist-nouveau-conf.j2. Here is the content

blacklist nouveau
options nouveau modeset=0

The Ansible script for disabling Noveau using a template

- name: Generate blacklist nouveau
  template:
    src: ../templates/blacklist-nouveau-conf.j2
    dest: /etc/modprobe.d/blacklist-nouveau.conf
    owner: root
    group: root
    mode: 0644
  become: true
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Step 5: Install the Drivers and CUDA

- name: Install driver packages RHEL 8 and newer
  dnf:
    name: '@nvidia-driver:latest-dkms'
    state: present
    update_cache: yes
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
  register: install_driver

- name: Install CUDA
  dnf:
    name: cuda
    state: present
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
  register: install_cuda

Step 6: Reboot if there are changes to Drivers and CUDA

- name: Reboot if there are changes to Drivers or CUDA
  ansible.builtin.reboot:
  when:
    - install_driver.changed or install_cuda.changed
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Aftermath

After reboot, you should try to do “nvidia-smi” commands, hopefully, you should see

If you have an error “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver for RHEL 8“, do follow the steps in NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver for RHEL 8 and run the ansible script in the blog.

You may also combine all these yml into one large yml file

Other better? Ansible Scripts

You may want to consider other better? options for https://github.com/NVIDIA/ansible-role-nvidia-docker

July 18, 2023 by kittycool only

Guide to Creating Symbolic Links with Ansible

You can use the ansible.builtin.file module. In my example below, I wanted to link the Module Environment profile.csh and profile.sh to be placed on the /etc/profile.d so that it will load on startup. Do take a look at the Ansible Document ansible.builtin.file module – Manage files and file properties

- name: Check for CUDA Link
  stat: path=/usr/local/cuda
  register: link_available

- name: Create a symbolic link for CUDA
  ansible.builtin.file:
    src: /usr/local/cuda-12.2
    dest: /usr/local/cuda
    owner: root
    group: root
    state: link
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
    - link_available.stat.isdir is not defined and link_available.stat.isdir == False

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux

Ansible