Using Intel Cluster Checker (Part 1)


What is Intel Cluster Checker?

Intel® Cluster Checker provides tools to collect data from the cluster, analysis of the collected data, and provides a clear report of the analysis. Using Intel® Cluster Checker helps to quickly identify issues and improve utilization of resources.

Intel® Cluster Checker verifies the configuration and performance of Linux®-based clusters through analysis of cluster uniformity, performance characteristics, functionality and compliance with Intel® High Performance Computing (HPC) specifications. Data collection tools and analysis provide actionable remedies to identified issues. Intel® Cluster Checker tools and analysis are ideal for use by developers, administrators, architects, and users to easily identify issues within a cluster.

Installing Intel Cluster Checker Using Yum Repository

If you are using Yum Installation, do take a look at Intel Cluster Checker 2019 Installation

If not, you can untar the package if you have the tar.gz

Environment Setup

# source /usr/local/intel/2018u3/bin/compilervars.sh intel64
# source /usr/local/intel/2018u3/mkl/bin/mklvars.sh intel64
# source /usr/local/intel/2018u3/impi/2018.3.222/bin64/mpivars.sh intel64
# source /usr/local/intel/2018u3/parallel_studio_xe_2018/bin/psxevars.sh intel64
# export MPI_ROOT=/usr/local/intel/2018u3/impi/2018.3.222/intel64
# source /usr/local/intel/cc2019/clck/2019.10/bin/clckvars.sh

Create a nodefile and put the hosts in

% vim nodefile
node1
node2
node3

Running Intel Cluster Checker

*Make sure you have SSH  login to the nodes without password. See SSH Login without Password

% clck -f nodefile

Examples of run…..

Running Collect

................................................................................................................................................................................................................
Running Analyze

SUMMARY
Command-line: clck -f nodefile
Tests Run: health_base
**WARNING**: 3 tests failed to run. Information may be incomplete. See clck_execution_warnings.log for more information.
Overall Result: 8 issues found - HARDWARE UNIFORMITY (2), PERFORMANCE (2), SOFTWARE UNIFORMITY (4)
-----------------------------------------------------------------------------------------------------------------------------------------
8 nodes tested: node010, node[003-009]
0 nodes with no issues:
8 nodes with issues: node010, node[003-009]
-----------------------------------------------------------------------------------------------------------------------------------------
FUNCTIONALITY
No issues detected.

HARDWARE UNIFORMITY
The following hardware uniformity issues were detected:
1. The InfiniBand PCI physical slot for device 'MT27800 Family [ConnectX-5]' 

PERFORMANCE
The following performance issues were detected:
1.Zombie processes detected.
1 node: node010
2. Processes using high CPU.
7 nodes: node010, node[003,005-009]

SOFTWARE UNIFORMITY
The following software uniformity issues were detected:
1. The OFED version, 'MLNX_OFED_LINUX-4.5-1.0.1.0 (OFED-4.5-1.0.1)', is not uniform.....
5 nodes: node[003-004,006-007,009]
2. The OFED version, 'MLNX_OFED_LINUX-4.3-1.0.1.0 (OFED-4.3-1.0.1)', is not uniform.....
3 nodes: node010, node[005,008]
3. Environment variables are not uniform across the nodes.
.....
4. Inconsistent Ethernet driver version.
.....

See the following files for more information: clck_results.log, clck_execution_warnings.log

Intel MPI Library Troubleshooting

If you are an admin and if you make sure their cluster is set up to work with the Intel® MPI Library, do the following

% clck -f nodefile -F mpi_prereq_admin

If you are non-privileged user and if you make sure their cluster is set up to work with the Intel® MPI Library, do the following

% clck -f nodefile -F mpi_prereq_user

More Information:

  1. Using Intel Cluster Checker (Part 1)
  2. Using Intel Cluster Checker (Part 2)
  3. Using Intel Cluster Checker (Part 3)

User Guide:

Advertisement

One thought on “Using Intel Cluster Checker (Part 1)

  1. Pingback: Intel Cluster Check install and test | Bits and Dragons

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.