Processing Images using Intel OpenVINO

Rohit Gupta
4y
7.3k
0
5

Article

Introduction

In this article, we will learn to write Python scripts to pre-process. Since to do so we will use OpenCV, I will also be discussing OpenCV and the OpenCV functions that we need.

OpenCV

OpenCV is a programming framework that primarily aims at computer vision in real-time. Originally developed by Intel, it was later supported by Willow Garage then Itseez. The repository is cross-platform and is publicly accessible under open-source BSD. It was first released in June 2000.

OpenCV Python API Commands

Some OpenCV commands that we will be using from this article onwards:

1. To Read the Video/Image

cv2.VideoCapture()
To initialize the VideoCapture object
cv2.VideoCapture(filename)
To load the filename into the VideoCapture object
cv2.VideoCapture(device)
To load the device id into the VideoCapture object
cv2.VideoCapture.open(filename)
To load and open the filename into the VideoCapture object
cv2.VideoCapture.open(device)
To load and open the device into the VideoCapture object
cv2.VideoCapture.release()
to close file or capturing device
cv2.VideoCapture.read([image])
To grab, decode and return the next video frame

where

filename
Name of the opened video file (ex. video.avi) or an image sequence (ex. img_%02d.jpg, which will read samples img_00.jpg, img_01.jpg, and so on)
device
Id of the opened video capturing device (i.e a camera index). If there is a single camera connected, just pass 0.

2. To Write the Video/Image

cv2.VideoWriter()
To create default VideoWriter Object
cv2.VideoWriter([filename, fourcc, fps, framesize [,isColor]])
To create parameterized VideoWriter Object
cv2.VideoWriter.isOpened()
To return if the VideoWriter object is loaded or not
cv2.VideoWriter.open([filename, fourcc, fps, framesize [,isColor]])
To open the VideoWriter object to start writing
cv2.VideoWriter.write(image)
To start the writing/creating video

where

filename
Name of the output video file
framesize
Size of the video frame
fps
Frame rate of the created/loaded video
fourcc
4 character code of codec used to compress the frames
isColor
If not zero, the encoder will expect and encode color frames, otherwise it will work with grayscale

3. To Change the Size

dst= cv2.resize(src, dsize[, dst [, fx [, fy, interpolation]]]])

Where

src
Name/Path of the input image
dst
Name/Path of the output image
dsize
Size of the output image
dsize= size(round(fx*src.cols), round(fy*src.rows))
fx
Scalar factor along the horizontal axis
fx = (double) dsize.width/src.cols
fy
Scalar factor along the vertical axis
fy= (double) dsize.height/src.rows
interpolation
Interpolation values

4. To Change the Color of Image/Frame of Video

cv2.CvtColor(src, dst, code)

where

src
Name/Path of the input image
dst
Name/Path of the output image
code
Color space conversion code

CV2_BGR2GRAY
To convert BGR to Gray
CV2_RGB2GRAY
To convert RGB to Gray
CV2_GRAY2BGR
To convert Gray to BGR
CV2_GRAY2RGB
To convert Gray to RGB

5. To Construct Rectangle

cv2.rectangle(img, pt1, pt2, color [, thickness [, lineType [, shift ]]] )
cv2.rectangle(img, rec, color [, thickness [, lineType [, shift ]]] )

where

img
Name/Path of the input image
pt1
Vertex of rectangle
pt2
Vertex of rectangle opposite to pt1
color
Rectangle color or brightness
thickness
Thickness of lines
lineType
Type of line
shift
Number of fractional bits in the point coordinates

6. To Read an Image

cv2.imread(filename [, flags ])

where

flags
- CV_LOAD_IMAGE_COLOR
- CV_LOAD_IMAGE_ANYDEPTH
- CV_LOAD_IMAGE_GRAYSCALE
- >0, return a 3-channel color image
- =0, return a grayscale image
- <0, return the loaded images as it is

7. To Write an Image

cv2.imwrite(file, img, [, params])

where

param

CV_IMWRITE_JPEG_QUALITY or value can be between 0 to 100
CV_IMWRITE_PNG_QUALITY or value can be between 0 to 9
CV_IMWRITE_PXM_BINARY or value can be 0 or 1

8. For Canny Edge Detection

edges= cv2.Canny(image, threshold1, threshold2 [, edges [, apertureSize [, L2gradient ]]])
edges= cv2.Canny(dx, dy, threshold1, threshold2 [, edges, [, L2gardient ]])

where

image
Single-channel 8-bit input image
edges
Output edge map, it has the same size and type as image
threshold1
1st threshold for the hysteresis procedure
threshold2
2nd threshold for the hysteresis procedure
apertureSize
Aperture size for the Sobel() operator
L2Gradient
a flag, indicating whether a more accurate should be used to calculate the image gradient magnitude ( L2gradient=true ), or whether the default norm, is enough ( L2gradient=false ).

Image Pre-Processing

Many may think that we should start processing directly, but that is not the case, in order to get the best of the image or video frame, we have to pre-process an image. Pre-processing means to augment or normalize an image or video frame so that all the pixels are at the same level. Let me give you an example, suppose you have to look at 4 images with (1024x1024, 380x280, 100x100 and 400x50 pixels) and find which image is a monkey. If we want to process these images, we have to change the dsize value each time as the pixel values are changing constantly for each image. So to solve this we need to either increase or decrease the number the pixels for each image, this process of increasing or decreasing pixels is what we call pre-processing,

Image Processing Application

Let us start. In this article, we will not be interacting with the pre-trained models that we downloaded in the previous article, as to interact with these pre-trained models we need to use OpenVINO Python API, which we will discuss in the coming article. This article is intended just to tell you how we do processing or you can say pre-processing on an image or video frame.

def preprocessing(input_image, height, width):
image = cv2.resize(input_image, (width, height))
image = image.transpose((2, 0, 1))
image = image.reshape(1, 3, height, width)
return image

In the above code, we resize the coming frame to the desired height and width. After that, we perform the transpose operation which we will convert the image vector format from (height, weight, channel) to (channel, height, weight) format. And at last, we do the reshaping to convert the image to [batch_size, number_of_channels, height, width] vector, which is the required input format by the Pre-Trained Models.

def pose_estimation(input_image):
preprocessed_image = np.copy(input_image)
preprocessed_image = preprocessing(preprocessed_image, 256, 456)
return preprocessed_image
def text_detection(input_image):
preprocessed_image = np.copy(input_image)
preprocessed_image = preprocessing(preprocessed_image, 768, 1280)
return preprocessed_image
def car_meta(input_image):
preprocessed_image = np.copy(input_image)
preprocessed_image = preprocessing(preprocessed_image, 72, 72)
return preprocessed_image

In the above code, we pre-process each of the given images as per the required dimensions by the corresponding model.

Note:

You can refer to the official documentation of the corresponding model, to find the dimensions.

After we finish writing the code for preprocessing, we will now write the control logic, here our aim is to see if the images are loading and check whether we are able to preprocess the given images.

# Image locations
POSE_IMAGE = cv2.imread("sitting-on-car.jpg")
TEXT_IMAGE = cv2.imread("sign.jpg")
CAR_IMAGE = cv2.imread("blue-car.jpg")
# Test names
test_names = ["Pose Estimation", "Text Detection", "Car Meta"]

In the above code, we load the images and assign the name of the test that we will perform.

def set_solution_functions():
global solution_funcs
solution_funcs = {
test_names[0]: pose_solution,
test_names[1]: text_solution,
test_names[2]: car_solution,
}
def pose_solution(input_image):
return preprocessing(input_image, 256, 456)
def text_solution(input_image):
return preprocessing(input_image, 768, 1280)
def car_solution(input_image):
return preprocessing(input_image, 72, 72)

Tthe above code is used to initialize all the required variables, here these functions are used to return the preprocessed images.

# function to preprocess the "sitting-on-car.jpg"
def test_pose():
comparison = test(pose_estimation, test_names[0], POSE_IMAGE)
return comparison
# function to preprocess the "sign.jpg"
def test_text():
comparison = test(text_detection, test_names[1], TEXT_IMAGE)
return comparison
# function to preprocess the "blue-car.jpg"
def test_car():
comparison = test(car_meta, test_names[2], CAR_IMAGE)
return comparison
# function to carry out the test on the passed image
def test(test_func, test_name, test_image):
try:
s_processed = test_func(test_image)
except:
print_exception(test_name)
return
solution = solution_funcs[test_name](test_image)
comparison = np.array_equal(s_processed, solution)
print_test_result(test_name, comparison)
return comparison

In the above code, we declare all the test functions. In each of the test-specific functions, we are passing the name of the test, image name, and the name of the function on which the test has to be performed.

In the "test" function, we test if we are able to get any output, and if we are not able to get any output we print an exception for the corresponding test. The criteria to pass the test is that the preprocessed image vector should be the same as the original image vector.

def print_exception(test_name):
print("Failed to run test on {}.".format(test_name))
print("The code should be valid Python and return the preprocessed image.")
def print_test_result(test_name, result):
if result:
print("Passed {} test.".format(test_name))
else:
print("Failed {} test, did not obtain expected preprocessed image.".format(test_name))

The above are the helper functions needed by "test" function.

def feedback(tests_passed):
print("You passed {} of 3 tests.".format(int(tests_passed)))
if tests_passed == 3:
print("Congratulations!")
else:
print("See above for additional feedback.")

The above code is used to print the final output; i.e. if you pass all 3 tests it will print "congratulations" otherwise it will ask to see which all test application was not able to pass.

Output

Passed Pose Estimation test.

Passed Text Detection test.

Passed Car Meta test.

You passed 3 of 3 tests.

Congratulations!

I have attached all 3 images that I used and the Python scripts with proper formatting and comments.

Conclusion

In this article, I discussed how we pre-process an image so that we can get the maximum benefit from it. In the coming articles, I will tell you how we can combine the pre-trained models and the preprocessing to make an application that can annotate an image with its characteristics.

MCN Solutions Pvt. Ltd.

Technical Lead