ABBYY FlexiCapture Engine - Read Text From PDF Or Image File

ABBYY FlexiCapture Engine is a Software Development Kit (SDK) for extracting the data from different types of documents, such as PDF, images, or scanned documents, etc. It is one of the best solutions for developers to extract the text content from letters, invoices, forms, identity cards etc. Apart from this text capturing task, we can also perform the image processing ( Best Quality) task.

Note: ABBYY FlexiCapture Engine is not an open source application.  You should have the license for using this application. You can get  a trial license by sending the request via your business e-mail id.

FlexiLayout Studio

ABBYY FlexiLayout Studio is an application software through which you get text from structured, semi-structured and unstructured documents. The application contains the different tools for extracting the data from your document, such as -  Block, Element, Group, Checkmark etc. Using this application, we can mark the area from where we have to get the data. You can export the AFL File (.afl extension), once you mark all the required text and values.

Getting text from any complex document, using FlexiLayout Studio.

  1. Install FlexiLayout Studio which you get from the ABBYY SDK technology.

  2. Open the application and create a project, as shown below.

    application

  3. Add your file (Image, PDF, Scan etc.) using the Add Images button.

    Add your File

  4. Double-click on your file to display the file in the documents area.

    display

  5. Create a document as a Training set.

    Create a document

  6. Now, let's suppose, we have to get Policy Number, Claim No, Insured Name etc. from the document. A FlexiLayout section contains Blocks and Elements as a mark which indicates the Marked area on your document. So, create the block and element whenever required.

    Create

  7. For creating an element, use the element tool to select the area that you want to capture by dragging the tool. For data separation, you can create a group and add element in created group, as the following image.

    element

  8. Now, create a value of policy number. And, do the same for all the required data filed.

    create a value

  9. Create a relation for an element for accessing the element value by its name in block section.

    Create a relation

    Create a relation

    You have to create an element relation with all its value elements. And, the value element will be referenced by the block. This will be accessible by different programming languages, as C#, VB, C++ etc.

  10. Now, create a block for accessing the element.

    Create a block

    You have to perform the above task for referencing all the blocks from an element. Following are the final element and block sets through which we will access the field value.

     element and block set

  11. Now, export the project and save AFL (Abby FlexiLayout) file with a name you want.

    Export the project

    The above was the AFL project creation task to access the data from any type of the document. Now, we will use this AFL file in my C# programming to access the data.

Using FlexiLayout file in Windows or Web application to access the data.

  1. Create a Windows or Web application in which you want to access the data. Here, we are creating a Windows application.

    • Open Visual Studio.
    • Go to file menu and choose New Project.
    • Select Windows form application, Type the name of project and click OK button.

  2. Design the form as the following. We are using two text boxes and one button to locate the the FlexiLayout file(.afl extension), and the document whose data we have to extract.

    Design

  3. Go to Code behind and add FCEngine.dll in your reference folder. (From ABBYY SDK which you get from ABBYY technology.)

    Design

  4. Write the following code in your code behind and run the application.
    1. using FCEngine;  
    2. using System;  
    3. using System.Collections.Generic;  
    4. using System.ComponentModel;  
    5. using System.Data;  
    6. using System.Drawing;  
    7. using System.IO;  
    8. using System.Runtime.InteropServices;  
    9. using System.Text;  
    10. using System.Windows.Forms;  
    11. namespace AflToDocDiff {  
    12.     public partial class Form1: Form {  
    13.         IEngine engine = null;  
    14.         IFlexiCaptureProcessor processor;  
    15.         IDocumentDefinition definition;  
    16.         IDocumentDefinition definition1;  
    17.         StringBuilder sb;  
    18.         private Dictionary < string, Dictionary < string, FieldParams >> fieldToType = new Dictionary < string, Dictionary < string, FieldParams >> ();  
    19.         ////private string[] imageFiles;  
    20.         ////private int currentImageFileIndex = 0;  
    21.         ////private Image processedImage;  
    22.         public Form1() {  
    23.             InitializeComponent();  
    24.         }  
    25.         private IEngine loadEngine() {  
    26.             IEngine engine;  
    27.             int hresult = InitializeEngine(FceConfig.GetDeveloperSN(), out engine);  
    28.             Marshal.ThrowExceptionForHR(hresult);  
    29.             return engine;  
    30.         }  
    31.         private void unloadEngine(ref IEngine engine) {  
    32.                 engine = null;  
    33.                 int hResult = DeinitializeEngine();  
    34.                 Marshal.ThrowExceptionForHR(hResult);  
    35.             }  
    36.             [DllImport(FceConfig.DllPath, CharSet = CharSet.Unicode), PreserveSig]  
    37.         internal static extern int InitializeEngine(String devSN, out IEngine engine);  
    38.         [DllImport(FceConfig.DllPath, CharSet = CharSet.Unicode), PreserveSig]  
    39.         internal static extern int DeinitializeEngine();  
    40.         class Gdi32 {  
    41.             [DllImport("gdi32.dll", EntryPoint = "DeleteObject")]  
    42.             internal static extern IntPtr DeleteObject(IntPtr hDc);  
    43.         }  
    44.         private void CreateDocumentDef() {}  
    45.             // IDocumentDefinition[] newDocumentDefinitions;  
    46.             //string[] templateNames;  
    47.         private void button1_Click(object sender, EventArgs e) {  
    48.             try {  
    49.                 engine = loadEngine();  
    50.                 //definition1 = engine.CreateDocumentDefinitionFromAFL(txtAflFilePath.Text, "English");  
    51.                 //tuneRecognitionParams("1", definition1);  
    52.                 //((ICustomStorage)definition1).SaveToFile(@"D:\\AflToDocDiff\\AflToDocDiff\\FcdotDoc\\Test1.fcdot");  
    53.                 string extension = Path.GetExtension(txtAflFilePath.Text);  
    54.                 if (processor == null) {  
    55.                     ICustomStorage storage = null;  
    56.                     switch (extension) {  
    57.                         case ".xfd":  
    58.                             definition = engine.CreateDocumentDefinitionFromXFD(txtAflFilePath.Text, "English");  
    59.                             storage = (ICustomStorage) definition;  
    60.                             break;  
    61.                         case ".afl":  
    62.                             definition = engine.CreateDocumentDefinitionFromAFL(txtAflFilePath.Text, "English");  
    63.                             storage = (ICustomStorage) definition;  
    64.                             break;  
    65.                         case ".fcdot":  
    66.                             definition = engine.CreateDocumentDefinition();  
    67.                             storage = (ICustomStorage) definition;  
    68.                             storage.LoadFromFile(txtAflFilePath.Text);  
    69.                             break;  
    70.                         default:  
    71.                             throw new NotImplementedException();  
    72.                     }  
    73.                     processor = engine.CreateFlexiCaptureProcessor();  
    74.                     processor.AddDocumentDefinition(definition);  
    75.                 } else {  
    76.                     processor.ResetProcessing();  
    77.                     processor = null;  
    78.                 }  
    79.                 var imageTools = engine.CreateImageProcessingTools();  
    80.                 var file = imageTools.OpenImageFile(txtFormPath.Text);  
    81.                 int pageCount = file.PagesCount;  
    82.                 for (int i = 0; i < pageCount; i++) {  
    83.                     processor.AddImage(file.OpenImagePage(i));  
    84.                 }  
    85.                 IDocument document = processor.RecognizeNextDocument();  
    86.                 if (document != null && document.DocumentDefinition != null) {  
    87.                     sb = new StringBuilder();  
    88.                     buildDocumentView(document);  
    89.                     // File.WriteAllText(@"C:\Users\GT-PC-15\Desktop\AFL\Result\RF" + DateTime.Now.ToString("yyyyMMddTHHmmss") + ".txt", sb.ToString());  
    90.                 }  
    91.                 string desc = definition.Description;  
    92.             } catch (Exception ex) {  
    93.                 string msg = ex.Message;  
    94.             } finally {  
    95.                 unloadEngine(ref engine);  
    96.                 processor = null;  
    97.             }  
    98.         }  
    99.         private void tuneRecognitionParams(string name, IDocumentDefinition definition) {  
    100.             if (fieldToType.ContainsKey(name)) {  
    101.                 bool modified = false;  
    102.                 var fields = definition.Sections[0].Fields;  
    103.                 for (int i = 0; i < fields.Count; i++) {  
    104.                     if (fieldToType[name].ContainsKey(fields[i].Name)) {  
    105.                         var fieldParams = fieldToType[name][fields[i].Name];  
    106.                         var textParams = fields[i].RecognitionParams.AsTextParams();  
    107.                         if (fieldParams.Type == FieldValueTypeEnum.FVT_Text) {  
    108.                             if (fieldParams.RegExp != null) {  
    109.                                 int pos = fieldParams.Params.IndexOf(' ');  
    110.                                 string letters = fieldParams.Params.Substring(0, pos);  
    111.                                 string regExp = fieldParams.Params.Substring(pos + 1);  
    112.                                 var newLanguage = textParams.CreateEmbeddedLanguage(LanguageTypeEnum.LT_Simple, null);  
    113.                                 newLanguage.AsSimpleLanguage().set_LetterSet(LanguageLetterSetEnum.LLS_Alphabet, letters);  
    114.                                 newLanguage.AsSimpleLanguage().RegularExpression = regExp;  
    115.                                 textParams.Language = newLanguage;  
    116.                             }  
    117.                         } else {  
    118.                             var newLanguage = textParams.CreateEmbeddedLanguageByDataType(fieldParams.Type);  
    119.                             textParams.Language = newLanguage;  
    120.                         }  
    121.                         textParams.TextType = fieldParams.TextType;  
    122.                         textParams.CaseRecognitionMode = fieldParams.CaseType;  
    123.                         modified = true;  
    124.                     }  
    125.                 }  
    126.                 if (modified) {  
    127.                     definition.Check();  
    128.                 }  
    129.             }  
    130.         }  
    131.         private void button2_Click(object sender, EventArgs e) {  
    132.             DialogResult result = openFileDialog1.ShowDialog(); // Show the dialog.  
    133.             if (result == DialogResult.OK) // Test result.  
    134.             {  
    135.                 txtAflFilePath.Text = openFileDialog1.FileName;  
    136.             }  
    137.         }  
    138.         private void button3_Click(object sender, EventArgs e) {  
    139.             DialogResult result = openFileDialog2.ShowDialog(); // Show the dialog.  
    140.             if (result == DialogResult.OK) // Test result.  
    141.             {  
    142.                 txtFormPath.Text = openFileDialog2.FileName;  
    143.             }  
    144.         }  
    145.         private void buildDocumentView(IDocument document) {  
    146.             IField firstSection = document.Sections[0];  
    147.             addDocumentNodeChildren(firstSection.Children);  
    148.         }  
    149.         private void addDocumentNodeChildren(IFields children) {  
    150.             for (int i = 0; i < children.Count; i++) {  
    151.                 addDocumentNode(children[i]);  
    152.             }  
    153.         }  
    154.         private void addDocumentNode(IField documentNode) {  
    155.             IFieldValue value = documentNode.Value;  
    156.             if (value == null) {  
    157.                 // sb.AppendLine(string.Format("{0}:{1}", documentNode.Name, value.AsString));  
    158.             } else {  
    159.                 string keyval = documentNode.Name + ": " + documentNode.Value;  
    160.                 sb.AppendLine(string.Format("{0}:{1}", documentNode.Name, value.AsString));  
    161.                 string FormName = Convert.ToString(value.AsString);  
    162.                 if (FormName.Contains("125")) {  
    163.                     lblFormName.Text = "Accord Form 125";  
    164.                     string str = FormName.Substring(0, 8);  
    165.                     str = str.Split('('')')[1];  
    166.                 } else if (FormName.Contains("126")) {  
    167.                     lblFormName.Text = "Accord Form 126";  
    168.                 } else if (FormName.Contains("140")) {  
    169.                     lblFormName.Text = "Accord Form 140";  
    170.                 } else {  
    171.                     lblFormName.Text = "Please upload valid form.";  
    172.                 }  
    173.             }  
    174.             lblFormName.Text = sb.ToString();  
    175.             if (documentNode.Instances != null) {  
    176.                 addDocumentNodeInstances(documentNode.Instances);  
    177.             } else if (documentNode.Children != null) {  
    178.                 addDocumentNodeChildren(documentNode.Children);  
    179.             }  
    180.         }  
    181.         private void addDocumentNodeInstances(IFieldInstances instances) {  
    182.             for (int i = 0; i < instances.Count; i++) {  
    183.                 if (instances[i].Children != null) {  
    184.                     addDocumentNodeChildren(instances[i].Children);  
    185.                 }  
    186.             }  
    187.         }  
    188.     }  
    189. }  

Output

Output


Similar Articles