Intel OpenVINO Inference Engine


In this article, I will help you understand how we communicate with the Intel OpenVINO to get the app running and to get the output. As you know Intel OpenVINO is best suited for computer vision applications, so I will be using a computer vision application to demonstrate.

Key Terms


1. Inference Engine

It offers a computer-vision database, facilitates telephone calls to other computer-vision repositories, such as OpenCV, and optimizes information on models for intermediate representation. Works with different plugins to allow much more tuning for different hardware.

2. Synchronous

These requests wait until a particular request is met until the next request begins. For example, the user is authorized to use the application while waiting for a reply to a network call to a server that has uncertain latency for the reply.

3. Asynchronous

These requests will arrive at the same time, meaning that the next one doesn't have to wait until the prior one is collected. For example, before more data can be analyzed, the program has to wait for user feedback.

4. IE Core

The principal Python wrapper for the Inference Engine. Check the enabled layers on a specific network and incorporate any required CPU extensions for loading an IENetwork (CPU extensions are removed in versions from 2020R1 onwards).

5. IE Network

A model class with an IR setting. You will load this into an IECore and then return it as an executable network.

6. Executable Network

An example of a loaded network into an IECore ready to be deduced. It has synchronous and asynchronous requests and includes a multitude of InferRequests items.

7. InferRequest

Person requests for inferences for the inference engine, including image by image. Each one of these requires the inputs as well as the output of the submission.

Intel OpenVINO Python API

Intel OpenVINO has its Python API that can be used to get the desired result. I will be explaining only those classes and functions which I will be using in the demo. To learn about each class of the API, visit

1. ie_api.IECore

This class is a leading category, which helps you to use single inferences to handle plugins.
  1. __init__(self, xml_config_file="")
    It is the class constructor, which returns the instance of IECore class.
    • The complete path to the .XML file containing the plugin setup is specified in xml_config_file. If this attribute is allocated nothing, default settings are used.
  2. add_extension(self, extension_path, device_name)
    This functionality is to load the extension library with a defined application name into the plugin. Nothing is returned by this method.
    • extension_path represents to the extension library file to load a plugin
    • device_name represents the device for which we have to upload the plugin
  3. load_network(self, network, device_name, config=None, num_requests=1)
    It loads the IENetwork object read from IR to the IENetwork plugin with the device name and generates and returns an IENetwork Executable Object.
    • device_name represents the name of the target edge device. Values can be CPU, FPGA.0, FPGA.1, MYRIAD, GPU
    • config represents a dict of plugin configuration keys and their values
    • The number of infer requests to be given for the object returned is num requests. 0 implies that the optimum number of calls is generated
  4. query_network(self, network, device_name, config=None)
    It allows the plugin to return a dictionary of supported layers for mapping and application names with the given user name.
  1. Here num of requests can be understood as the number of parallel threads that must be employed.
  2. Multiple devices can be used with the use of the Hetro plugin about which I will talk later in the coming articles.

2. ie_api.IENetwork

This class includes details on the IR-read network architecture and can be modified with some model parameters including layer affinity and output layers. 
  1. __init__(self, model, weights, init_from_buffer)
    IENetwork class constructors which return an instance of IENetwork
    • model represent the path of .XML file
    • weights represent the path of .bin file
    • If the value of initi_from_buffer is False, the attributes are interpreted as strings, and if the value if True then attributes are interpreted as Python bytes.
  2. reshape(self, input_shapes)
    This function allows the network to be restructured to change the partial lengths, lot sizes, or depth.
    • input_shapes represents the dict that maps input layers names to tuples with the target shape
  3. serialize(self, path_to_xml, path_to_bin)
    The network can be serialized and stored in files. Nothing is returned by this method.
    • path_to_xml represents the file where the serialized model will be stored
    • path_to_bin represents the file where the serialized weights will be stored.
  1. model and weights value can be string path or bytes with the file content.

3. ie_api.ExecutableNetwork

The class represents a network instance loaded to plugin and ready for inference. 
  1. __init__(self)
    ExecutableNetwork class constructor which returns an instance of ExecutableNetwork 
  2. infer(self, inputs=None)
    It begins a sync inference for the executable network's 1'st infer request and returns output data, sends a diction mapping the output layer to numpy.ndarray artifacts with layer outputs.
    • inputs represent a dict that maps input layer names to numpy.ndarray objects of proper shape with input data from the layer
  4. start_async(self, request_id, inputs=None)
    It is used to launch an asynchronous inference for a provided program. This returns an instance of the InferRequest class handler of the stated Infer request.
    • request_id represents the index of infer request to start inference
    • inputs represent a dict that maps input layer names to numpy.ndarray objects of proper shape with input data to the layer
  6. wait(self, num_requests=None, timeout=None)
    This is used to provide a lock function to wait for the outcome of any request. Returns "RESULT NOT READY" or "OK" as per the outcome.
    • num_requests represent the number of idle requests for which it needs to wait. By default, it is initialized to the number of requests.
    • Timeout represents the time to wait in milliseconds or special values like 0 and -1. Default value is -1.

4. ie_api.InferRequest

This class offers an interface for inferring ExecutableNetwork requests which is used to collect the responses.
  1. __init__(self)
    This class has no specific class constructor. Use ie_api.IECore.load network to construct a legitimate InferRequest case.
  2. async_request(self, inputs=None)
    It is used to launch the infer request and supply the output list with a new synchronous inference.
    • inputs represent a dict that maps input layer names to a numpy.ndarray objects of proper shape with input data for the layer
  4. get_perf_counts(self)
    This method is used to query the layer-by-layer output calculation and obtain feedback regarding the time layer. It returns a dictation containing information per layer execution.
  5. infer(self, inputs=None)
    This method is used to launch the assumed requests clustered and to fill in the output list.
    • inputs represent a dict that maps input layer names to a numpy.ndarray objects of proper shape with input data for the layer
  6. wait(self, timeout=None)
    This is used to provide a lock function to wait for the outcome of any request.
    • timeout represents the time to wait in milliseconds or special value of 0 and -1. The default value is -1.  Here 0 means that output is returned immediately, and 1 means wait until inference becomes available

Demo Application

Now I will demonstrate how we can use all the concepts that we have learned studied, before reading this I would recommend you to please go through all the previous articles:
I will be using the pre-trained model that are using the following:
  1. Human Pose Estimation: human-pose-estimation-0001
  2. Text Detection: text-detection-0004
  3. Determining Car Type & Color: vehicle-attributes-recognition-barrier-0039
The application aims to annotate the given image input with its features. Now let's start programming. 
  1. import argparse    
  2. import cv2    
  3. import numpy as np    
  5. from handle_models import handle_output, preprocessing    
  6. from inference import Network   
In the above code, we are importing the required libraries
  1. argparse is used to create a command-line argument structure
  2. cv2 is used to import OpenCV
  3. numpy is used to perform some basic tasks on the numpy.ndarray, to learn about numpy, visit
  4. handle_models is the python script that has all the processing logic defined
  5. inference is the python script that is used to communicate with the OpenVINO Python API
  1. CAR_COLORS = ["white""gray""yellow""red""green""blue""black"]    
  2. CAR_TYPES = ["car""bus""truck""van"]    
The above code is specific to cars, as we are defining car type and car colors, for the model to choose from.
  1. def get_mask(processed_output):    
  2.     # Create an empty array for other color channels of the mask    
  3.     empty = np.zeros(processed_output.shape)    
  4.     # Stack to make a Green mask  
  5.     mask = np.dstack((empty, processed_output, empty))    
  6.     return mask    
In the above code, we are telling the model to highlight the output in green.
  1. if model_type == "POSE":   
  2.     #Remove final part of output not used for heatmaps  
  3.     output = output[: -1]  
  4.     # Get only pose detections above 0.5 confidence, set to 255  
  5.     for c in range(len(output)):  
  6.         output[c] = np.where(output[c] > 0.52550)  
  7.         # Sum along the "class" axis  
  8.     output = np.sum(output, axis = 0)  
  9.     # Get a semantic mask  
  10.     pose_mask = get_mask(output)  
  11.     # Combine with the original image  
  12.     image = image + pose_mask  
  13.     return image  
The above code is intended for human pose estimation, where we tell the human pose estimation model to highlight the features point in green.
  1. elif model_type == "TEXT":    
  2.     # Get only text detections above 0.5 confidence, set to 255    
  3.     output = np.where(output[1] > 0.52550)    
  4.     # Get a semantic mask    
  5.     text_mask = get_mask(output)    
  6.     # Add the mask to the image    
  7.     image = image + text_mask    
  8.     return image    
The above code is intended for text detection, where we tell the text detection mode to highlight the detected text in green. 
  1. elif model_type == "CAR_META":      
  2.     # Get the color and car type from their lists      
  3.     color = CAR_COLORS[output[0]]      
  4.     car_type = CAR_TYPES[output[1]]      
  5.     # Scale the output text by the image shape      
  6.     scaler = max(int(image.shape[0] / 1000), 1)      
  7.     # Write the text of color and type onto the image      
  8.     image = cv2.putText(image, "Color: {}, Type: {}".format(color, car_type), (50 * scaler, 100 * scaler),   
  9.                         cv2.FONT_HERSHEY_SIMPLEX, 2 * scaler, (255255255), 3 * scaler,)      
  10.     return image   
The above code is intended for car meta feature detection, where we tell the code to add a text on to the image containing the meta feature information.
  1. #Create a Network for using the Inference Engine  
  2. inference_network = Network()  
  3. # Load the model in the network and obtain its input shape  
  4. n, c, h, w = inference_network.load_model(args.m, args.d, args.c)  
In the above code, we instantiated the Network class, so that we can pass the necessary parameters like the location of IR files, device type on which we need to execute the application, and the type of model to used.
  1. # Read the input image    
  2. image = cv2.imread(args.i)    
  3. preprocessed_image = preprocessing(image, h, w)   
The above code is intended to preprocess the input image using the handle_models.preprocessing method.
  1. # Perform synchronous inference on the image    
  2. inference_network.sync_inference(preprocessed_image)    
  4. # Obtain the output of the inference request    
  5. output = inference_network.extract_output()    
  7. output_func = handle_output(args.t)    
  8. processed_output = output_func(output, image.shape)    
  10. # Create an output image based on network    
  11. try:    
  12.     output_image = create_output_image(args.t, image, processed_output)    
  13.     print("Success")    
  14. except:    
  15.     output_image = image    
  16.     print("Failure")    
  18. # Save down the resulting image    
  19. cv2.imwrite("outputs/{}-output.png".format(args.t), output_image)    
In the above code, we are telling the program to run inference in sync mode and the output returned is then sent to the create_output function to perform the necessary actions. And at the end, create the output image with "output" suffixed. 
  1. heatmaps = output["Mconv7_stage2_L2"]    
  2. out_heatmap = np.zeros([heatmaps.shape[1], input_shape[0], input_shape[1]])    
  3. print(out_heatmap.shape)    
  4. for h in range(len(heatmaps[0])):    
  5.     out_heatmap[h] = cv2.resize(heatmaps[0][h], input_shape[0:2][::-1])    
  6. return out_heatmap    
The above code is intended to get the pose estimation output, as you see in the official documentation, "Mconv7_stage2_L2" is the desired output parameter.
  1. first_blob = output["model/link_logits_/add"]    
  2. out_blob = np.zeros([first_blob.shape[1], input_shape[0], input_shape[1]])    
  3. for h in range(len(first_blob[0])):    
  4.     out_blob[h] = cv2.resize(first_blob[0][h], input_shape[0:2][::-1])    
  5. print(first_blob.shape[0], first_blob.shape[1], first_blob.shape[2])    
  6. return out_blob    
The above code is intended to get the text detection output, as per the official documentation, "model/link_logits/add" is the desired output parameter.
  1. color = np.argmax(output["color"].flatten())    
  2. ttype = np.argmax(output["type"].flatten())    
  3. return color, ttype   
The above code is intended to get the car meta data output, we had to flattern both the color and type so that we can get a linear array.
  1. image = np.copy(input_image)    
  2. image = cv2.resize(image, (width, height))    
  3. image = image.transpose((201))    
  4. image = image.reshape(13, height, width)    
  5. return image   
The above code is intended to preprocess all the image provided.
  1. if model_type == "POSE":    
  2.     return handle_pose    
  3. elif model_type == "TEXT":    
  4.     return handle_text    
  5. elif model_type == "CAR_META":    
  6.     return handle_car    
  7. else:    
  8.     return None    
The above code is the logic that tells the program which method to invoke based on the model type provided.
  1. import os    
  2. import sys    
  3. import logging as log    
  4. from openvino.inference_engine import IENetwork, IECore   
The above code is intended to import the necessary libraries.
  1. os and sys is used to use the Python System module functions
  2. logging is an optional library which you may use or leave also, it is intended to log all the errors, warning, and information.
  3. IENetwork and IECore is imported from openvino.inference_engine
  1. model_xml = model    
  2. model_bin = os.path.splitext(model_xml)[0] + ".bin"   
In the above code, we define the variables for .XML and .bin file. For bin file we remove the ".xml" from the passed fine name and append ".bin".
  1. # Initialize the plugin    
  2. self.plugin = IECore()    
  4. # Add a CPU extension, if applicable    
  5. if cpu_extension and "CPU" in device:    
  6.     self.plugin.add_extension(cpu_extension, device)    
  8. # Read the IR as an IENetwork    
  9. network = IENetwork(model=model_xml, weights=model_bin)   
In the above code, we instantiate the IECore class and attach the CPU extension file if we are running the application on CPU. CPU extension is necessary as CPUs are not designed to run these types of application, so by adding CPU extension we provide the CPU with the algorithm to run such kind of applications.
Post that we pass the XML and bin file path for the IENetwork object to process and return an executable network.
  1. # Load the IENetwork into the plugin    
  2. self.exec_network = self.plugin.load_network(network, device)    
  4. # Get the input layer    
  5. self.input_blob = next(iter(network.inputs))    
  7. # Return the input shape (to determine preprocessing)    
  8. return network.inputs[self.input_blob].shape  
In the above code, we perform the inference by iterating through the input, here a python iterator is used to iterate.
  1. self.exec_network.infer({self.input_blob: image})
In the above code, we perform the inference, here you could have defined a 'wait' method so as to wait for the output if the input size is big, which may take some time to process, in this application 'wait' function is not required.
  1. self.exec_network.requests[0].outputs
In the above code, we are extracting the first component because the second component gives information about the errors that may have occurred during execution.
In order to execute the execute the application for Car Meta Data Model, execute the following command:
  1. python - i "images/blue-car.jpg" - t "CAR_META"   
  2. - m "/home/workspace/models/vehicle-attributes-recognition-barrier-0039.xml"   
  3. - c "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/" 
To run execute the application for text detection:
  1. python -i "images/sign.jpg" -t "TEXT"  
  2.  -m "/home/workspace/models/text-detection-0004.xml"   
  3. -c "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/" 
To execute for human pose estimation:
  1. python -i "images/sitting-on-car.jpg" -t "POSE"   
  2. -m "/home/workspace/models/human-pose-estimation-0001.xml"   
  3. -c "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/" 
  1. I have attached all the code and all  3 input images, you can try the same with different models.
  2. I have used the code I used during my Udacity Nanodegree.
  3. I have used Linux as the base OS, as I had some issues running the application on Windows. You may use the same command, but just change the path of the parameters accordingly.


In the above article, I tried to explain to you how an inference engine works and how we can use it to create a demo edge application. We will dive more into the coming article. So stay tuned to C# Corner for more articles.
For any doubts, feel free to comment. And if you like the article do give it a like.