Release of MVAPICH2 2.3.4 GA and OSU Micro-Benchmarks (OMB) 5.6.3

The MVAPICH team is pleased to announce the release of MVAPICH2 2.3.4 GA and OSU Micro-Benchmarks (OMB) 5.6.3.

Features and enhancements for MVAPICH2 2.3.4 GA are as follows:

* Features and Enhancements (since 2.3.3):

  • Improved performance for small message collective operations
  • Improved performance for data transfers from/to non-contiguous buffers used by user-defined datatypes
  • Add custom API to identify if MVAPICH2 has in-built CUDA support
  • New API ‘MPIX_Query_cuda_support’ defined in mpi-ext.h
    • New macro ‘MPIX_CUDA_AWARE_SUPPORT’ defined in mpi-ext.h
  • Add support for MPI_REAL16 based reduction operations for Fortran programs
    • MPI_SUM, MPI_MAX, MPI_MIN, MPI_LAND, MPI_LOR, MPI_MINLOC, and MPI_MAXLOC
    • Thanks to Greg Lee@LLNL for the report and reproduced
    • Thanks to Hui Zhou@ANL for the initial patch
  • Add support to intercept aligned_alloc in ptmalloc
    • Thanks to Ye Luo @ANL for the report and the reproduced
  • Add support to enable fork safety in MVAPICH2 using environment variable
    • “MV2_SUPPORT_FORK_SAFETY”
  • Add support for user to modify QKEY using environment variable
    • “MV2_DEFAULT_QKEY”
  • Add multiple MPI_T PVARs and CVARs for point-to-point and collective operations
  • Enhanced point-to-point and collective tuning for AMD EPYC Rome, Frontera@TACC, Longhorn@TACC, Mayer@Sandia, Pitzer@OSC, Catalyst@EPCC, Summit@ORNL, Lassen@LLNL, and Sierra@LLNL systems
  • Give preference to CMA if LiMIC2 and CMA are enabled at the same time
  • Move -lmpi, -lmpicxx, and -lmpifort before other LDFLAGS in compiler wrappers like mpicc, mpicxx, mpif77, and mpif90
  • Allow passing flags to nvcc compiler through environment variable NVCCFLAGS
  • Display more meaningful error messages for InfiniBand asynchronous events
  • Add support for AMD Optimizing C/C++ (AOCC) compiler v2.1.0
  • Add support for GCC compiler v10.1.0
    • Requires setting FFLAGS=-fallow-argument-mismatch at configure time
  • Update to hwloc v2.2.0

 

* Bug Fixes (since 2.3.3):

  • Fix compilation issue with IBM XLC++ compilers and CUDA 10.2
  • Fix hangs with MPI_Get operations win UD-Hybrid mode
  • Initialize MPI3 data structures correctly to avoid random hangs caused by garbage values
  • Fix corner case with LiMIC2 and MPI3 one-sided operations
  • Add proper fallback and warning message when shared RMA window cannot be created
  • Fix race condition in calling mv2_get_path_rec_sl by introducing mutex
    • Thanks to Alexander Melnikov for reporting the issue and providing the patch
  • Fix mapping generation for the cases where hwloc returns zero on non-numa machines
    • Thanks to Honggang Li @Red Hat for the report and initial patch
  • Fix issues with InfiniBand registration cache and PGI20 compiler
  • Fix warnings raised by Coverity scans
    • Thanks to Honggang Li @Red Hat for the report
  • Fix bad baseptr address returned from MPI_Win_shared_query
    • Thanks to Adam Moody@LLNL for the report and discussion
  • Fix issues with HCA selection logic in heterogeneous multi-rail scenarios
  • Fix spelling mistake in error message
    • Thanks to Bill Long and Krishna Kandalla @Cray/HPE for the report
  • Fix compilation warnings and memory leaks

 

New features, enhancements and bug fixes for OSU Micro-Benchmarks

(OMB) 5.6.3 are listed here

* New Features & Enhancements (since v5.6.2)

  • Add support for benchmarking applications that use ‘fork’ system call

* osu_latency_mp

 

* Bug Fixes (since v5.6.2)

  • Fix compilation issue with IBM XLC++ compilers and CUDA 10.2
  • Allow passing flags to nvcc compiler
  • Fix issues in window creation with host-to-device and device-to-host transfers for one-sided tests

For downloading MVAPICH2 2.3.4 GA, OMB 5.6.3, and associated user guides, quick start guide, and accessing the SVN, please visit the following URL:

http://mvapich.cse.ohio-state.edu