Analyzing Memory and Threading Correctness for GPU-Offloaded Code


Modern workloads are diverse—and so are architectures. No single architecture is best for every workload. Maximizing performance takes a mix of scalar, vector, matrix, and spatial architectures deployed in CPU, GPU, FPGA, and other future accelerators. Heterogeneity adds complexity that can be difficult to debug. This article introduces the new features of Intel® Inspector that support the analysis of code that’s offloaded to accelerators.

For more information: Analyzing Memory and Threading Correctness for GPU-Offloaded Code

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.