Processing Images using Intel OpenVINO


In this article, we will learn to write Python scripts to pre-process. Since to do so we will use OpenCV, I will also be discussing OpenCV and the OpenCV functions that we need. 


OpenCV is a programming framework that primarily aims at computer vision in real-time. Originally developed by Intel, it was later supported by Willow Garage then Itseez. The repository is cross-platform and is publicly accessible under open-source BSD. It was first released in June 2000. 

OpenCV Python API Commands 

Some OpenCV commands that we will be using from this article onwards:

1. To Read the Video/Image

  1. cv2.VideoCapture()
    To initialize the VideoCapture object
  2. cv2.VideoCapture(filename)
    To load the filename into the VideoCapture object
  3. cv2.VideoCapture(device)
    To load the device id into the VideoCapture object
    To load and open the filename into the VideoCapture object
    To load and open the device into the VideoCapture object
  6. cv2.VideoCapture.release()
    to close file or capturing device
    To grab, decode and return the next video frame
  • filename
    Name of the opened video file (ex. video.avi) or an image sequence (ex. img_%02d.jpg, which will read samples img_00.jpg, img_01.jpg, and so on)
  • device
    Id of the opened video capturing device (i.e a camera index). If there is a single camera connected, just pass 0.

2. To Write the Video/Image

  1. cv2.VideoWriter()
    To create default VideoWriter Object
  2. cv2.VideoWriter([filename, fourcc, fps, framesize [,isColor]])
    To create parameterized VideoWriter Object
  3. cv2.VideoWriter.isOpened()
    To return if the VideoWriter object is loaded or not
  4.[filename, fourcc, fps, framesize [,isColor]])
    To open the VideoWriter object to start writing
  5. cv2.VideoWriter.write(image)
    To start the writing/creating video
  • filename
    Name of the output video file
  • framesize
    Size of the video frame
  • fps
    Frame rate of the created/loaded video
  • fourcc
    4 character code of codec used to compress the frames
  • isColor 
    If not zero, the encoder will expect and encode color frames, otherwise it will work with grayscale

3. To Change the Size

dst= cv2.resize(src, dsize[, dst [, fx [, fy, interpolation]]]])
  • src
    Name/Path of the input image
  • dst
    Name/Path of the output image
  • dsize
    Size of the output image
    dsize= size(round(fx*src.cols), round(fy*src.rows))
  • fx
    Scalar factor along the horizontal axis
    fx = (double) dsize.width/src.cols
  • fy
    Scalar factor along the vertical axis
    fy= (double) dsize.height/src.rows
  • interpolation
    Interpolation values

4. To Change the Color of Image/Frame of Video

cv2.CvtColor(src, dst, code) 
  • src
    Name/Path of the input image
  • dst
    Name/Path of the output image
  • code
    Color space conversion code
    1. CV2_BGR2GRAY
      To convert BGR to Gray
    2. CV2_RGB2GRAY
      To convert RGB to Gray
    3. CV2_GRAY2BGR
      To convert Gray to BGR
    4. CV2_GRAY2RGB
      To convert Gray to RGB

5. To Construct Rectangle

  1. cv2.rectangle(img, pt1, pt2, color [, thickness [, lineType [, shift ]]] )
  2. cv2.rectangle(img, rec, color [, thickness [, lineType [, shift ]]] )
  • img
    Name/Path of the input image
  • pt1
    Vertex of rectangle
  • pt2
    Vertex of rectangle opposite to pt1
  • color
    Rectangle color or brightness
  • thickness
    Thickness of lines
  • lineType
    Type of line
  • shift 
    Number of fractional bits in the point coordinates

6. To Read an Image

cv2.imread(filename [, flags ])
  • flags
    • >0, return a 3-channel color image
    • =0, return a grayscale image
    • <0, return the loaded images as it is 

7. To Write an Image

cv2.imwrite(file, img, [, params])
  • param
    • CV_IMWRITE_JPEG_QUALITY or value can be between 0 to 100
    • CV_IMWRITE_PNG_QUALITY or value can be between 0 to 9
    • CV_IMWRITE_PXM_BINARY or value can be 0 or 1

8. For Canny Edge Detection

  1. edges= cv2.Canny(image, threshold1, threshold2 [, edges [, apertureSize [, L2gradient ]]])
  2. edges= cv2.Canny(dx, dy, threshold1, threshold2 [, edges, [, L2gardient ]])
  • image
    Single-channel 8-bit input image
  • edges
    Output edge map, it has the same size and type as image
  • threshold1
    1st threshold for the hysteresis procedure
  • threshold2
    2nd threshold for the hysteresis procedure
  • apertureSize
    Aperture size for the Sobel() operator
  • L2Gradient 
    a flag, indicating whether a more accurate Processing Images using Intel OpenVINO  should be used to calculate the image gradient magnitude ( L2gradient=true ), or whether the default norm, Processing Images using Intel OpenVINO  is enough ( L2gradient=false ).

Image Pre-Processing

Many may think that we should start processing directly, but that is not the case, in order to get the best of the image or video frame, we have to pre-process an image. Pre-processing means to augment or normalize an image or video frame so that all the pixels are at the same level. Let me give you an example, suppose you have to look at 4 images with (1024x1024, 380x280, 100x100 and 400x50 pixels) and find which image is a monkey. If we want to process these images, we have to change the dsize value each time as the pixel values are changing constantly for each image. So to solve this we need to either increase or decrease the number the pixels for each image, this process of increasing or decreasing pixels is what we call pre-processing,

Image Processing Application 

Let us start. In this article, we will not be interacting with the pre-trained models that we downloaded in the previous article, as to interact with these pre-trained models we need to use OpenVINO Python API, which we will discuss in the coming article. This article is intended just to tell you how we do processing or you can say pre-processing on an image or video frame. 
  1. def preprocessing(input_image, height, width):  
  2.     image = cv2.resize(input_image, (width, height))  
  3.     image = image.transpose((201))  
  4.     image = image.reshape(13, height, width)  
  6.     return image  
In the above code, we resize the coming frame to the desired height and width. After that, we perform the transpose operation which we will convert the image vector format from (height, weight, channel) to (channel, height, weight) format. And at last, we do the reshaping to convert the image to [batch_size, number_of_channels, height, width] vector, which is the required input format by the Pre-Trained Models.
  1. def pose_estimation(input_image):     
  2.     preprocessed_image = np.copy(input_image)    
  3.     preprocessed_image = preprocessing(preprocessed_image, 256456)    
  4.     return preprocessed_image    
  6. def text_detection(input_image):    
  7.     preprocessed_image = np.copy(input_image)    
  8.     preprocessed_image = preprocessing(preprocessed_image, 7681280)    
  9.     return preprocessed_image    
  11. def car_meta(input_image):    
  12.     preprocessed_image = np.copy(input_image)    
  13.     preprocessed_image = preprocessing(preprocessed_image, 7272)    
  14.     return preprocessed_image     
In the above code, we pre-process each of the given images as per the required dimensions by the corresponding model.
You can refer to the official documentation of the corresponding model, to find the dimensions.
After we finish writing the code for preprocessing, we will now write the control logic, here our aim is to see if the images are loading and check whether we are able to preprocess the given images.
  1. # Image locations  
  2. POSE_IMAGE = cv2.imread("sitting-on-car.jpg")  
  3. TEXT_IMAGE = cv2.imread("sign.jpg")  
  4. CAR_IMAGE = cv2.imread("blue-car.jpg")  
  6. # Test names  
  7. test_names = ["Pose Estimation""Text Detection""Car Meta"]  
In the above code, we load the images and assign the name of the test that we will perform.
  1. def set_solution_functions():  
  2.     global solution_funcs  
  3.     solution_funcs = {  
  4.         test_names[0]: pose_solution,  
  5.         test_names[1]: text_solution,  
  6.         test_names[2]: car_solution,  
  7.     }  
  9. def pose_solution(input_image):  
  10.     return preprocessing(input_image, 256456)  
  12. def text_solution(input_image):  
  13.     return preprocessing(input_image, 7681280)  
  15. def car_solution(input_image):  
  16.     return preprocessing(input_image, 7272)   
Tthe above code is used to initialize all the required variables, here these functions are used to return the preprocessed images.  
  1. # function to preprocess the "sitting-on-car.jpg"  
  2. def test_pose():  
  3.     comparison = test(pose_estimation, test_names[0], POSE_IMAGE)  
  4.     return comparison  
  6. # function to preprocess the "sign.jpg"  
  7. def test_text():  
  8.     comparison = test(text_detection, test_names[1], TEXT_IMAGE)  
  9.     return comparison  
  11. # function to preprocess the "blue-car.jpg"  
  12. def test_car():  
  13.     comparison = test(car_meta, test_names[2], CAR_IMAGE)  
  14.     return comparison  
  16. # function to carry out the test on the passed image   
  17. def test(test_func, test_name, test_image):  
  19.     try:  
  20.         s_processed = test_func(test_image)  
  21.     except:  
  22.         print_exception(test_name)  
  23.         return  
  25.     solution = solution_funcs[test_name](test_image)  
  26.     comparison = np.array_equal(s_processed, solution)  
  27.     print_test_result(test_name, comparison)  
  29.     return comparison   
In the above code, we declare all the test functions. In each of the test-specific functions, we are passing the name of the test, image name, and the name of the function on which the test has to be performed.
In the "test" function, we test if we are able to get any output, and if we are not able to get any output we print an exception for the corresponding test. The criteria to pass the test is that the preprocessed image vector should be the same as the original image vector.
  1. def print_exception(test_name):    
  2.     print("Failed to run test on {}.".format(test_name))    
  3.     print("The code should be valid Python and return the preprocessed image.")    
  5. def print_test_result(test_name, result):    
  6.     if result:    
  7.         print("Passed {} test.".format(test_name))    
  8.     else:    
  9.         print("Failed {} test, did not obtain expected preprocessed image.".format(test_name))     
The above are the helper functions needed by "test" function.
  1. def feedback(tests_passed):  
  2.     print("You passed {} of 3 tests.".format(int(tests_passed)))  
  3.     if tests_passed == 3:  
  4.         print("Congratulations!")  
  5.     else:  
  6.         print("See above for additional feedback.")  
The above code is used to print the final output; i.e. if you pass all 3 tests it will print "congratulations" otherwise it will ask to see which all test application was not able to pass.
Passed Pose Estimation test.
Passed Text Detection test.
Passed Car Meta test.
You passed 3 of 3 tests.
I have attached all 3 images that I used and the Python scripts with proper formatting and comments. 


In this article, I discussed how we pre-process an image so that we can get the maximum benefit from it. In the coming articles, I will tell you how we can combine the pre-trained models and the preprocessing to make an application that can annotate an image with its characteristics.