Modern workloads are diverse—and so are architectures. No single architecture is best for every workload. Maximizing performance takes a mix of scalar, vector, matrix, and spatial architectures deployed in CPU, GPU, FPGA, and other future accelerators. Heterogeneity adds complexity that can be difficult to debug. This article introduces the new features of Intel® Inspector that support the analysis of code that’s offloaded to accelerators.
For more information: Analyzing Memory and Threading Correctness for GPU-Offloaded Code