Reading And Writing The Text From A Scanned PDF Using RPA

Introduction

 
While updating records and important documents, we used to scan them and store them as a PDF files. In some processes, we need to read the text from the scanned document and then store it as a word document or in an Excel sheet. You can automate those processes using the RPA (Robotic Process Automation) concept. Here I am going to explain about the text reading concept. In future coming articles I will explain about reading the images from the PDF files. In this article, we will discuss the automation process of reading the text from the PDF file and storing it. Here, I am using the message box to display content read from the PDF file.
 
Tools Used
 
Here, I am using UiPath's Studio to implement this concept. It is a tool by UiPath for the process of Robotic Process Automation. It helps you automate your day to day activities and your work process. You can install the community version of the UiPath Studio for free from the UiPath’s official website. For reference, you can read my article on the introduction to UiPath’s StudioX by clicking here. The simple change you need to make is to choose the Studio version instead of StudioX. Then, the process remains the same.
 

Steps For Creating the PDF Reading Automation Flow

 
Creating a new process
 
You can create a new process by clicking the process option under the new project.
 
Reading And Writing The Text From A Scanned PDF Using RPA
 
After clicking it, a dialog box gets open. In that, you can enter the name for the process and location of the process and description for that. Then click the create button.
 
Reading And Writing The Text From A Scanned PDF Using RPA
 
It loads and the process page gets open.
 
Creating Workflow
 
At the design page, click the Open Main Workflow option.
 
Reading And Writing The Text From A Scanned PDF Using RPA
 
Design page gets open. There you can add a flow chart by clicking New, Flowchart.
 
Reading And Writing The Text From A Scanned PDF Using RPA
 
Then a dialog box with the information for the new flow chart gets open. There you can enter the flow name and the location for the flowchart. Then click the create button.
 
Reading And Writing The Text From A Scanned PDF Using RPA
 
A new flowchart page gets created.
 

Creating Sequence

 
Then you need to create a new sequence inside the flowchart by drag and drop from the activities pane. If you have not found it in recent means you can search it using the search activities search box. You can just add by just drag and drop process. Then a new sequence gets created. You can rename the sequence and other processes by clicking the F2 key.
 
Reading And Writing The Text From A Scanned PDF Using RPA
 
Preparing tool for execution
 
You need to make sure that some dependencies are installed in your tool. You can see it or add it in the management package option.
 
Reading And Writing The Text From A Scanned PDF Using RPA
 
The needed packages are found in the below image.
 
Reading And Writing The Text From A Scanned PDF Using RPA
 
Adding activities to the Sequence
 
For reading the PDF file I am using the read PDF with OCR activity and I am adding the OCR engine as google OCR engine. That will help you to read the PDF. After inserting that you need to add the file location and name by clicking the browse button given there.
 
For Writing the text which gets read from the PDF I am using a Write Text File activity. after adding that you need to specify the text as readPDFTxt for using the text gets read from the PDF. Then I am assigning the file name for writing the text in a file and storing it.
 
For Displaying the text in PDF, I am using a message box activity and the message as readPDFTxt for displaying the text.
 
Reading And Writing The Text From A Scanned PDF Using RPA
 

RUNNING THE FLOW

 
You can run the flow by clicking the Debug File button for debugging and running.
 
Reading And Writing The Text From A Scanned PDF Using RPA
 
After running, it will display the text from the PDF as output in a message box and a text file will be stored with the PDF content. Hence, the flow runs successfully.
 
Reading And Writing The Text From A Scanned PDF Using RPA
 

Conclusion

 
This is one of the common automation processes done in RPA. I will add more functions of the UiPath RPA tool in future articles. If you have any doubt regarding this article you can contact me.


Similar Articles