The MVAPICH team is pleased to announce the release of MVAPICH2 2.3.4 GA and OSU Micro-Benchmarks (OMB) 5.6.3.
Features and enhancements for MVAPICH2 2.3.4 GA are as follows:
* Features and Enhancements (since 2.3.3):
- Improved performance for small message collective operations
- Improved performance for data transfers from/to non-contiguous buffers used by user-defined datatypes
- Add custom API to identify if MVAPICH2 has in-built CUDA support
- New API ‘MPIX_Query_cuda_support’ defined in mpi-ext.h
- New macro ‘MPIX_CUDA_AWARE_SUPPORT’ defined in mpi-ext.h
- Add support for MPI_REAL16 based reduction operations for Fortran programs
- MPI_SUM, MPI_MAX, MPI_MIN, MPI_LAND, MPI_LOR, MPI_MINLOC, and MPI_MAXLOC
- Thanks to Greg Lee@LLNL for the report and reproduced
- Thanks to Hui Zhou@ANL for the initial patch
- Add support to intercept aligned_alloc in ptmalloc
- Thanks to Ye Luo @ANL for the report and the reproduced
- Add support to enable fork safety in MVAPICH2 using environment variable
- “MV2_SUPPORT_FORK_SAFETY”
- Add support for user to modify QKEY using environment variable
- “MV2_DEFAULT_QKEY”
- Add multiple MPI_T PVARs and CVARs for point-to-point and collective operations
- Enhanced point-to-point and collective tuning for AMD EPYC Rome, Frontera@TACC, Longhorn@TACC, Mayer@Sandia, Pitzer@OSC, Catalyst@EPCC, Summit@ORNL, Lassen@LLNL, and Sierra@LLNL systems
- Give preference to CMA if LiMIC2 and CMA are enabled at the same time
- Move -lmpi, -lmpicxx, and -lmpifort before other LDFLAGS in compiler wrappers like mpicc, mpicxx, mpif77, and mpif90
- Allow passing flags to nvcc compiler through environment variable NVCCFLAGS
- Display more meaningful error messages for InfiniBand asynchronous events
- Add support for AMD Optimizing C/C++ (AOCC) compiler v2.1.0
- Add support for GCC compiler v10.1.0
- Requires setting FFLAGS=-fallow-argument-mismatch at configure time
- Update to hwloc v2.2.0
* Bug Fixes (since 2.3.3):
- Fix compilation issue with IBM XLC++ compilers and CUDA 10.2
- Fix hangs with MPI_Get operations win UD-Hybrid mode
- Initialize MPI3 data structures correctly to avoid random hangs caused by garbage values
- Fix corner case with LiMIC2 and MPI3 one-sided operations
- Add proper fallback and warning message when shared RMA window cannot be created
- Fix race condition in calling mv2_get_path_rec_sl by introducing mutex
- Thanks to Alexander Melnikov for reporting the issue and providing the patch
- Fix mapping generation for the cases where hwloc returns zero on non-numa machines
- Thanks to Honggang Li @Red Hat for the report and initial patch
- Fix issues with InfiniBand registration cache and PGI20 compiler
- Fix warnings raised by Coverity scans
- Thanks to Honggang Li @Red Hat for the report
- Fix bad baseptr address returned from MPI_Win_shared_query
- Thanks to Adam Moody@LLNL for the report and discussion
- Fix issues with HCA selection logic in heterogeneous multi-rail scenarios
- Fix spelling mistake in error message
- Thanks to Bill Long and Krishna Kandalla @Cray/HPE for the report
- Fix compilation warnings and memory leaks
New features, enhancements and bug fixes for OSU Micro-Benchmarks
(OMB) 5.6.3 are listed here
* New Features & Enhancements (since v5.6.2)
- Add support for benchmarking applications that use ‘fork’ system call
* osu_latency_mp
* Bug Fixes (since v5.6.2)
- Fix compilation issue with IBM XLC++ compilers and CUDA 10.2
- Allow passing flags to nvcc compiler
- Fix issues in window creation with host-to-device and device-to-host transfers for one-sided tests
For downloading MVAPICH2 2.3.4 GA, OMB 5.6.3, and associated user guides, quick start guide, and accessing the SVN, please visit the following URL: