RELION – Performance Benchmark and Profiling

What is RELION?

RELION (REgularized LIkelihood OptimizatioN) is an open-source program for the refinement of macromolecular structures by single-particle analysis of electron cryomicroscopy (cryo-EM) data

RELION (REgularized LIkelihood OptimizatioN) implements an empirical Bayesian approach for analysis of electron cryo-microscopy (Cryo-EM)

RELION provides refinement methods of singular or multiple 3D reconstructions as well as 2D class averages

RELION is an important tool in the study of living cells

HPC-AI Advisory Council

Performance Analysis Summary

(from Article See RELION – Performance Benchmark and Profiling)

RELION performance testing

  • Pool size 4,8,16 gave best performance on 16,24,32 nodes
  • SHARP In-Network Computing reduces MPI time by 13% and increase overall application performance by 5
  • Performance advantages increases with system size, up to 32 nodes were tested

RELION Profile

  • Rank #0 does not perform computation
  • Mostly MPI_Barrier (70%)
  • Ring communication matrix

References:

Using FIO to measure IO Performance

FIO is a tool for measuring IO performance. You can use FIO to run a user-defined workload and collect the associated performance data.

A good writeup can be found at Fio Basics. For more information, do MAN on fio

Step 1: To Install fio, make sure you activated the epel repository.

# yum install fio

Parameters to be used.

% fio --filename=myfio --size=5GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1
  • –filename
    An entire block size or filename can be specified. For example if you are testing block size filename=/dev/sda:/dev/sdb
  • –size
    Indicate the size of the test file generated.  If you are testing the block size and did not specify the size, the entire block will be used.
  • –direct=1
    This means that the page cache is bypassed. No memory is used.
  • –rw
    Check whether IO is access sequentially or randonly. There are a few options.
    read – Sequential reading
    write – Sequential Write
    randread – random reading
    randwrite – random writing
    readwrite , rw – Mixed, sequential workload
    randrw – Mixed Random Workload
  • –bs
    Block Size. By default 4kB is used.
  • –ioengine=libiao
    libaio  enables asynchronous access from the application level. The option would requires –direct=1
  • –iodepth
    Refers to how many requests.
  • –numjobs
    Specifies the number of processes that start the jobs in parallel
  • –runtime
    Terminate processing after the specified period of time
  • –name
    Name of the job
  • –time-based
    fio will run for the duration of the runtime specified
  • –group_reporting
    Display statistics for groups of jobs as a whole instead of for each individual job. This is especially true if
    numjobs is used;

 

General Linux OS Tuning for AMD EPYC

Step 1: Turn off swap
Turn off swap to prevent accidental swapping. Do not that disabling swap without sufficient memory can have undesired effects

swapoff -a

Step 2: Turn off NUMA balancing
NUMA balancing can have undesired effects and since it is possible to bind the ranks and memory in HPC, this setting is not needed

echo 0 > /proc/sys/kernel/numa_balancing

Step 3: Disable ASLR (Address Space Layout Ranomization) is a security feature used to prevent the exploitation of memory vulnerabilities

echo 0 > /proc/sys/kernel/randomize_va_space

Step 4: Set CPU governor to performance and disable cc6. Setting the CPU perfomance to governor to perfomrnaces ensures max performances at all times. Disabling cc6 ensures that deeper CPU sleep states are not entered.

cpupower frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
.....
.....
cpupower idle-set -d 2
Idlestate 2 disabled on CPU 0
Idlestate 2 disabled on CPU 1
Idlestate 2 disabled on CPU 2
.....
.....

References:

  1. Tuning Guard for AMD EPYC (pdf)

Turning ksm and ksmtuned off

In this blog, I will write on how to turn off KSM and ksmtuned since I do not need these services and save some unnecessary swapping activities on the disk.

What is KSM?

According to RedHat Site (8.4. KERNEL SAME-PAGE MERGING (KSM)),
Kernel same-page Merging (KSM), used by the KVM hypervisor, allows KVM guests to share identical memory pages. These shared pages are usually common libraries or other identical, high-use data. KSM allows for greater guest density of identical or similar guest operating systems by avoiding memory duplication……

KSM is a Linux feature which uses this concept in reverse. KSM enables the kernel to examine two or more already running programs and compare their memory. If any memory regions or pages are identical, KSM reduces multiple identical memory pages to a single page……

8.4.4 Kernel same-page merging (KSM) has a performance overhead which may be too large for certain environments or host systems. KSM may also introduce side channels that could be potentially used to leak information across guests. If this is a concern, KSM can be disabled on per-guest basis.

Deactivating KSM

# systemctl stop ksmtuned
Stopping ksmtuned:                                         [  OK  ]
# systemctl stop ksm
Stopping ksm:                                              [  OK  ]

To permanently deactivate KSM with the systemctl commands

# systemctl disable ksm
# systemctl disable ksmtuned

When KSM is disabled, any memory pages that were shared prior to deactivating KSM are still shared. To delete all of the PageKSM in the system, use the following command:

# echo 2 >/sys/kernel/mm/ksm/run

After this is performed, the khugepaged daemon can rebuild transparent hugepages on the KVM guest physical memory. Using # echo 0 >/sys/kernel/mm/ksm/run stops KSM, but does not unshare all the previously created KSM pages (this is the same as the # systemctl stop ksmtuned command).

References:

  1. Redhat – 8.4. Kernel Same-Page Merging (KSM)