OCR Functionality Through MODI for Extracting text Information from Image file in VB.NET

 Simply OCR means Optical Character Recognition. We can extract text and layout information from image file like MDI and TIFF file format. When one scan a paper page into a computer, it produces just an image file, a photo of the page. The computer cannot understand the letters on the page; you would use OCR functionality to convert it into a text or word processor file, so that you can read text.

it can be performed by Microsoft Office Document Imaging Object Model,for it we are need to use  the MODI Library in a Development Project.The MODI object model consists of the following objects:
 

               Document object:     Represents an ordered collection of pages (images).

               Image object:           Represents a single page of a document.

               Layout object:          Represents the results of optical character recognition (OCR) on a page.

               MiDocSearch object:  Exposes document search functionality.

               Viewer control:          Is an ActiveX control that displays the pages of a document

  Example for extracting text from tif file:
 

        Dim strWordInfo As String

        Dim docs As New MODI.Document

        docs.Create("C:\test.tif")

      
       Dim Success As Integer = Analyse(docs)

        If Success Then

            Dim j As Integer

            For j = 0 To docs.Images.Count - 1

                strWordInfo = strWordInfo & " " & docs.Images(0).Layout.Text

            Next

            strWordInfo = strWordInfo.Replace("'""''").ToString()

        End If

       Function Analyse(ByVal Doc As MODI.Document) As Integer

            If Doc Is Nothing Then

               Exit Function

            End If

        Try

            '  MODI call for OCR

            ' _MODIDocument.OCR(_MODIParameters.Language, '_MODIParameters.WithAutoRotation,              _MODIParameters.WithStraightenImage)

            Doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, TrueTrue)

            Analyse = 1

        Catch ex As Exception

            'MessageBox.Show("OCR was successful but no text was recognized")                 

            Analyse = 0

        End Try

    End Function

Note : The most important point here to performing all tasks is to add a reference to " Microsoft Office Document Imaging Type Library", In case of

 Microsoft Outlook 2003, Add "
 Microsoft Office Document Imaging 11.0 Type Library " 
 Microsoft Outlook 2007, Add "
 Microsoft Office Document Imaging 12.0 Type Library "


Similar Articles