Learn to Accelerate TensorFlow on Intel® Architecture with Minimal Code Changes

The OpenVINO™ integration with TensorFlow enables you to speed up the TensorFlow workflow by adding just two lines of code. Enhance performance on Intel platforms while using the familiar TensorFlow APIs. Download this whitepaper to get started.

Do sign up and get the white papers Learn to Accelerate TensorFlow on Intel® Architecture with Minimal Code Changes

Efficient Heterogeneous Parallel Programming Using OpenMP

This article is taken from Intel “Efficient Heterogeneous Parallel Programming Using OpenMP”. In this article, we will show you how to do CPU+GPU asynchronous calculations using OpenMP.

In some cases, offloading computations to an accelerator like a GPU means that the host CPU sits idle until the offloaded computations are finished. However, using the CPU and GPU resources simultaneously can improve the performance of an application. In OpenMP® programs that take advantage of heterogenous parallelism, the master clause can be used to exploit simultaneous CPU and GPU execution. In this article, we will show you how to do CPU+GPU asynchronous calculation using OpenMP.

The Intel® oneAPI DPC++/C++ Compiler was used with following command-line options:
‑O3 ‑Ofast ‑xCORE‑AVX512 ‑mprefer‑vector‑width=512 ‑ffast‑math ‑qopt‑multiple‑gather‑scatter‑by‑shuffles ‑fimf‑precision=low
‑fiopenmp ‑fopenmp‑targets=spir64=”‑fp‑model=precise”

OpenMP provides true asynchronous, heterogeneous execution on CPU+GPU systems. It’s clear from our timing results and VTune profiles that keeping the CPU and GPU busy in the OpenMP parallel region gives the best performance. We encourage you to try this approach.

Intel: Efficient Heterogeneous Parallel Programming Using OpenMP (Best Practices to Keep the CPU and GPU Working at the Same Time)

Building a Deployment-Ready TensorFlow Model (Part 1)

This is an interesting 3-part article on OpenVINO Deep Learning Workbench.

Pruning deep learning models, combining network layers, developing for multiple hardware targets—getting from a trained deep learning model to a ready-to-deploy inference model seems like a lot of work, which it can be if you hand code it.

With Intel® tools you can go from trained model to an optimized, packaged inference model entirely online without a single line of code. In this article, we’ll introduce you to the Intel® toolkits for deep learning deployments, including the Intel® Distribution of OpenVINO™ toolkit and Deep Learning Workbench. After that, we’ll get you signed up for a free Intel DevCloud for the Edge account so that you can start optimizing your own inference models.

The No-Code Approach to Deploying Deep Learning Models on Intel® Hardware

For more information, see The No-Code Approach to Deploying Deep Learning Models on Intel® Hardware

ModuleNotFoundError: No module named ‘torch’ for OneAPI AI Toolkit

If you are using OneAPI Environment, and if you are having this issue

ModuleNotFoundError: No module named 'torch'

Here are some steps, you may wish to use to troubleshoot.

Make sure you activated the oneAPI environment using below command

% source /usr/local/intel/oneapi/2021.3/setvars.sh

:: initializing oneAPI environment ...
   -bash: BASH_VERSION = 4.2.46(2)-release
:: clck -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: inspector -- latest
:: intelpython -- latest
:: ipp -- latest
:: itac -- latest
:: LPOT -- latest
:: mkl -- latest
:: modelzoo -- latest
:: mpi -- latest
:: pytorch -- latest
:: tbb -- latest
:: tensorflow -- latest
:: oneAPI environment initialized ::

You might want to check the conda environment

% conda info --envs

# conda environments:
myenv                    /myhome/melvin/.conda/envs/myenv
myfsl                    /myhome/melvin/.conda/envs/myfsl
base                  *  /usr/local/intel/oneapi/2021.3/intelpython/latest
2021.3.0                 /usr/local/intel/oneapi/2021.3/intelpython/latest/envs/2021.3.0
myoneapi                 /usr/local/intel/oneapi/2021.3/intelpython/latest/envs/myoneapi
pytorch                  /usr/local/intel/oneapi/2021.3/intelpython/latest/envs/pytorch
pytorch-1.8.0            /usr/local/intel/oneapi/2021.3/intelpython/latest/envs/pytorch-1.8.0
tensorflow               /usr/local/intel/oneapi/2021.3/intelpython/latest/envs/tensorflow
tensorflow-2.5.0         /usr/local/intel/oneapi/2021.3/intelpython/latest/envs/tensorflow-2.5.0

Activate Pytorch

% conda activate pytorch
% python
% (pytorch-1.8.0) [user1@node1 ~]$ python
Python 3.7.10 (default, Jun  4 2021, 06:52:02)
[GCC 9.3.0] :: Intel Corporation on linux
Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import torch

If you are still having the error “ModuleNotFoundError: No module named ‘torch’ “

You may want to install directly if you have root access

% conda install pytorch torchvision cpuonly -c pytorch

If not, you may want to create a private environment similar to Creating Virtual Environment with Python using venv


Intel® Edge AI Certification

Intel® Edge AI Certification training courses can be started and completed at no charge. To get officially certified and receive a badge, you must complete the assessment and review process, which costs $99 for one year. Follow up with an annual recertification course to update your skills and credentials.

Certification training includes:

  • Hands-on experience with edge AI tools and platforms, including the Intel® Distribution of OpenVINO™ toolkit and Intel® DevCloud for the Edge
  • Use cases that detect safety gear, prevent retail losses, identify manufacturing defects, and solve other real-world problems with the combined application of computer vision deep-learning inference.
  • Development of your own edge AI solutions portfolio, drawing on libraries and APIs for TensorFlow*, PyTorch*, Open Neural Network Exchange (ONNX*), and other public models, running on your choice of Intel® DevCloud for the Edge hardware clusters.

For more information and to Sign up….. Click Here

Webinar: High Performance GPU Acceleration – Part 1: Code Design

  • Online Registration Here
  • Date: 13th October 2021 9am PDF

Heterogeneous computing comes with the challenge of designing code that can work in multi-processor/accelerator environments. Developers need to be equipped with the right set of metrics to make informed design and optimization decisions that take advantage of target hardware.

In Part 1 of this 2-part webinar series, Technical Consulting Engineer Cory Levels focuses on designing software for efficient offload from CPUs to GPUS—even before final hardware is available—using Intel® Advisor. Using a walkthrough of an ISO 3DFD example (3D isotropic Finite Difference), you will learn how to:

  • Optimize your CPU application for memory and compute
  • Identify efficient GPU offload opportunities and quantify the potential performance speed up
  • See performance headroom of your GPU offloaded code against hardware limitations, and get insights for an effective optimization roadmap

For More information, do take a look at the Intel Site Here.

Intel unveil Second-Generation Neuromorphic Chip

Various processors and pieces of code are often compared to brains, but neuromorphic chips work to much more directly mimic neurological systems through the use of computational “neurons” that communicate with one another. Intel’s first-generation Loihi chip, introduced in 2017, has around 128,000 of those digital neurons. Over the ensuing four years, Loihi has been packed into increasingly large systemslearned to touch and even been taught to smell.

Now, it’s getting a new family member: Loihi 2. In its press release, Intel said that years of testing with the first-generation Loihi chip helped them to design a second generation with up to ten times the processing speed; up to 15 times greater resource density; and up to a million computational neurons per chip – more than seven times those in the first generation. Intel reports that early tests have shown that Loihi 2 required more than 60 times fewer ops per inference when running deep neural networks as compared to Loihi 1 (without a loss in accuracy).

Intel Unveils Loihi 2, Its Second-Generation Neuromorphic Chip, HPCWire

Displaying Intel-MPI Debug Information

The Detailed Information can be found at Displaying MPI Debug Information

The I_MPI_DEBUG environment variable provides a convenient way to get detailed information about an MPI application at runtime. You can set the variable value from 0 (the default value) to 1000. The higher the value, the more debug information you get.

High values of I_MPI_DEBUG can output a lot of information and significantly reduce performance of your application. A value of I_MPI_DEBUG=5 is generally a good starting point, which provides sufficient information to find common errors.

Displaying MPI Debug Information

To redirect the debug information output from stdout to stderr or a text file, use the I_MPI_DEBUG_OUTPUT environment variable

$ mpirun -genv I_MPI_DEBUG=5 -genv I_MPI_DEBUG_OUTPUT=debug_output.txt -n 32 ./mpi_program

I_MPI_DEBUG Arguments

<level>Indicate the level of debug information provided.
0Output no debugging information. This is the default value.
1Output libfabric* version and provider.
2Output information about the tuning file used.
3Output effective MPI rank, pid and node mapping table.
4Output process pinning information.
5Output environment variables specific to the Intel® MPI Library.
> 5Add extra levels of debug information.
<flags>Comma-separated list of debug flags
pidShow process id for each debug message.
tidShow thread id for each debug message for multithreaded library.
timeShow time for each debug message.
datetimeShow time and date for each debug message.
hostShow host name for each debug message.
levelShow level for each debug message.
scopeShow scope for each debug message.
lineShow source line number for each debug message.
fileShow source file name for each debug message.
nofuncDo not show routine name.
norankDo not show rank.
nousrwarnSuppress warnings for improper use case (for example, incompatible combination of controls).
flockSynchronize debug output from different process or threads.
nobufDo not use buffered I/O for debug output.


  1. Displaying MPI Debug Information
  2. Developer Reference: I_MPI_DEBUG

Installing Intel® oneAPI AI Analytics Toolkit

What is included in the Intel oneAPI AI Analytics Toolkit? For more information, do take a look at Intel OneAPI Al Analytics Toolkit

  • Intel® Distribution for Python*
  • Intel® Distribution of Modin* (via Anaconda distribution of the toolkit using the Conda package manager)
  • Intel® Low Precision Optimization Tool
  • Intel® Optimization for PyTorch*
  • Intel® Optimization for TensorFlow*
  • Model Zoo for Intel® Architecture
  • Download size: 2.18 GB
  • Date: August 2, 2021
  • Version: 2021.3

Command Line Installation

wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18040/l_AIKit_p_2021.3.0.1370_offline.sh

sudo bash l_AIKit_p_2021.3.0.1370_offline.sh

Installation Instruction

Step 1: From the console, locate the downloaded install file.
Step 2: Use $ sudo sh ./<installer>.sh to launch the GUI Installer as the root.
Optionally, use $ sh ./<installer>.sh to launch the GUI Installer as the current user.
Step 3: Follow the instructions in the installer.
Step 4: Explore the Get Started Guide.


  1. Intel OneAPI Al Analytics Toolkit

Installing Intel OneAPI HPC Toolkit for Linux

What is included in the OneAPI Installer? For more information, do take a look at Get the Intel® oneAPI HPC Toolkit

  • Intel® oneAPI DPC++/C++ Compiler
  • Intel® oneAPI Fortran Compiler
  • Intel® C++ Compiler Classic
  • Intel® Cluster Checker
  • Intel® Inspector
  • Intel® MPI Library
  • Intel® Trace Analyzer and Collector
  • Download size: 1.25 GB
  • Version: 2021.3
  • Date: June 21, 2021
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/17912/l_HPCKit_p_2021.3.0.3230_offline.sh

sudo bash l_HPCKit_p_2021.3.0.3230_offline.sh

Installation Instruction:

  • Step 1: From the console, locate the downloaded install file.
  • Step 2: Use $ sudo sh ./<installer>.sh to launch the GUI Installer as root.
    Optionally, use $ sh ./<installer>.sh to launch the GUI Installer as current user.
  • Step 3: Follow the instructions in the installer.
  • Step 4: Explore the Get Started Guide.