Read PDF With OCR Activity Using UiPath Studio

Robotic Process Automation is the technology that allows anyone today to configure computer software, or a “robot” to emulate and integrate the actions of a human interacting within digital systems to execute a business process.
 
RPA robots utilize the user interface to capture data and manipulate applications just like humans do. They interpret, trigger responses and communicate with other systems in order to perform a vast variety of repetitive tasks. Substantially better: an RPA software robot never sleeps and makes zero mistakes.
 
UiPath is a leading Robotic Process Automation vendor providing a complete software platform to help organizations efficiently automate business processes.
 
UiPath Studio is a tool that can model an organization's business processes in a visual way.
 
The Read PDF with OCR Activity is used to extract data from the PDF documents which have both Text and Images. So, if you have any images apart from the text in the document, this activity would extract data from those images and give a Text output.
 
Reading this article, you can learn how to extract the PDF document contains text and images with text automation using Read PDF with OCR and activities in UiPath Studio Pro Community.
 
The following important tools are required for developing UiPath Bots,
  1. Windows 7/8.1/10 (Recommended)
  2. UiPath Studio Pro - Community Cloud (It is a free software available online – https://www.uipath.com/start-trial)
Now we can discuss step by step Bot development.
 
Step1
 
Open UiPath Studio -> Start -> New Project-> Click Process
 
Read PDF With OCR Activity Using UiPath Studio
 
Step 2
 
Now, create a New Blank Process,  name it UiPdfImage and give your description.
 
Read PDF With OCR Activity Using UiPath Studio
 
Step 3
 
Next, for extracting the text and images text in a PDF document, create a new Sequence workflow named GetImagePDF.
 
Read PDF With OCR Activity Using UiPath Studio
 
Next, to install the PDF packages ->Go to Manage packages and select Official and select UiPath.PDF.Activities and install it.
 
Read PDF With OCR Activity Using UiPath Studio
 
After installing the package,
 
Read PDF With OCR Activity Using UiPath Studio
 
Click Activities -> search Read PDF With OCR activity->Drag and drop in to sequence and select the PDF file,
 
Read PDF With OCR Activity Using UiPath Studio
 
Sample PDF with Text and Image containsText,
 
Read PDF With OCR Activity Using UiPath Studio
 
Create a String variable extractimage and set the properties range as 1 page and output text as extractimage
 
Read PDF With OCR Activity Using UiPath Studio
 
Click Activities -> search Tesseract OCR engineactivity->Drag and drop in to sequence,
 
Read PDF With OCR Activity Using UiPath Studio
 
Click Activities -> search Write Text File activity->Drag and drop in to sequence and set the properties Filename and Text ,
 
Read PDF With OCR Activity Using UiPath Studio
 
Step 5
 
For running your project, select debug file -> Run. The output of the UiPdfImage project is,
 
Read PDF With OCR Activity Using UiPath Studio
 

Summary

 
Now you have successfully extracted the text and images with text in the PDF document automation using UiPath Studio.


Similar Articles