Easily Extract Values from PDF Without AI Only Using .NET Core 8

Introduction 

We have many libraries available to extract content from PDF files without using AI algorithms. In this post, we will be discussing a library called BitMiracle.Docotic.Pdf to extract content from PDF files using .NET Core 8. 

You can learn more about bitmiracle from this URL. https://bitmiracle.com/pdf-library/ 

I am going to create a Web API project using .NET 8 core. 

Create .NET 8 project in Visual Studio 2022. 

We can choose Web API template and choose .NET core 8 framework. 

We can add our pdf extraction library to this project using NuGet package manager.

We have to add the changes below to Program.cs file. 

Program.cs 

using Microsoft.AspNetCore.Http.Features;
using Microsoft.Extensions.FileProviders;

var builder = WebApplication.CreateBuilder(args);
var MyAllowSpecificOrigins = "_myAllowSpecificOrigins";

// Add services to the container.

builder.Services.AddControllers();
// Learn more about configuring Swagger/OpenAPI at https://aka.ms/aspnetcore/swashbuckle
builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();

builder.Services.Configure<FormOptions>(o =>
{
    o.ValueLengthLimit = int.MaxValue;
    o.MultipartBodyLengthLimit = int.MaxValue;
    o.MemoryBufferThreshold = int.MaxValue;
});

builder.Services.AddCors(options =>
{
    options.AddPolicy(name: MyAllowSpecificOrigins,
                      policy =>
                      {
                          policy.WithOrigins("http://localhost:4200");
                      });
});

var app = builder.Build();

app.UseHttpsRedirection();
app.UseCors(MyAllowSpecificOrigins);
app.UseStaticFiles();
app.UseStaticFiles(new StaticFileOptions()
{
    FileProvider = new PhysicalFileProvider(Path.Combine(Directory.GetCurrentDirectory(), @"Resources")),
    RequestPath = new PathString("/Resources")
});

// Configure the HTTP request pipeline.
if (app.Environment.IsDevelopment())
{
    app.UseSwagger();
    app.UseSwaggerUI();
}

app.UseHttpsRedirection();

app.UseAuthorization();

app.MapControllers();

app.Run();

We can create FileUpload service now. 

FileUpload.cs

using BitMiracle.Docotic.Pdf;

namespace PDFExtraction.Services;

public static class FileUploadHelper
{
    public static InvoiceModel GetPDFContent(string file)
    {

        using (PdfDocument pdf = new PdfDocument(file))
        {
            string text = pdf.GetText();

            string delimiter = "\r\n";
            string[] contents = text.Split(delimiter);
            string date = "";
            string invoice = "";
            string total = "";
            foreach (string content in contents)
            {
                if (content.ToLower().Contains("date"))
                {
                    string delimiter2 = ":";
                    string[] contents2 = content.Split(delimiter2);
                    date = contents2[1];
                }
                if (content.ToLower().Contains("invoice #"))
                {
                    string delimiter2 = "#";
                    string[] contents2 = content.Split(delimiter2);
                    invoice = contents2[1];
                }
                if (content.ToLower().Contains("total"))
                {
                    string delimiter2 = "$";
                    string[] contents2 = content.Split(delimiter2);
                    total = contents2[1];
                }
            }
            InvoiceModel model = new InvoiceModel();
            model.Date = date;
            model.Invoice = invoice;
            model.Total = total;
            return model;
        }
    }
}

We can create the FileUplodController now. 

FileUplodController.cs 

using Microsoft.AspNetCore.Mvc;
using PDFExtraction.Services;
using System.Net.Http.Headers;

namespace PDFExtraction.Controllers
{
    [ApiController]
    [Route("[controller]")]
    public class FileUploadController : ControllerBase
    {
        [HttpPost("UploadFile"), DisableRequestSizeLimit]
        public ActionResult<InvoiceModel> Upload()
        {
            try
            {
                var file = Request.Form.Files[0];
                var folderName = Path.Combine("Resources", "Files");
                var pathToSave = Path.Combine(Directory.GetCurrentDirectory(), folderName);
                if (file.Length > 0)
                {
                    var fileName = ContentDispositionHeaderValue.Parse(file.ContentDisposition).FileName.Trim('"');
                    var fullPath = Path.Combine(pathToSave, fileName);
                    var dbPath = Path.Combine(folderName, fileName);
                    using (var stream = new FileStream(fullPath, FileMode.Create))
                    {
                        file.CopyTo(stream);
                    }
                    InvoiceModel value = FileUploadHelper.GetPDFContent(dbPath);
                    return Ok(value);
                }
                else
                {
                    return BadRequest();
                }
            }
            catch (Exception ex)
            {
                return StatusCode(500, $"Internal server error: {ex}");
            }
        }
    }
}

We must create one folder “Files” under “Resources” folder. 

Backend coding is completed. You can use Postman tool to test the API methods if needed.  

Create FileUpload component in Angular 16. 

ng new FileUpload16Angular 

We can create a new component FileUpload now. 

ng g c FileUpload 

Copy the code below and paste inside the component class file. 

file-upload.component.ts 

import { Component, OnInit } from '@angular/core';
import { HttpClient, HttpEventType } from '@angular/common/http';

@Component({
  selector: 'app-file-upload',
  templateUrl: './file-upload.component.html',
  styleUrls: ['./file-upload.component.css']
})
export class FileUploadComponent  implements OnInit {
 
  constructor(private http: HttpClient) {}
  date!:string;
  invoice!:string;
  total!:string;
  showDetails!:boolean;

  ngOnInit() {
    this.showDetails = false;
  } 

  onFileSelected(event: any) {
    const file: File = event.target.files[0];
    if (file) {
      const formData = new FormData();
      formData.append("file", file);

      const upload$ = this.http.post("https://localhost:5000/FileUpload/uploadfile", formData, {
        reportProgress: true,
        observe: 'events'
      });

      upload$.subscribe(event => {
        if (event.type === HttpEventType.UploadProgress) {
          console.log('Upload Progress:', Math.round(100 * event.loaded / (event.total ?? 1)));
        }
        else if(event.type == 4){
          let value :any =event.body
          this.date = value.date;
          this.invoice = value.invoice;
          this.total = value.total;
          this.showDetails = true;
        }
      });
    }
  }
}

Copy the code below and paste inside the template file. 

file-upload.component.html 

<input type="file" (change)="onFileSelected($event)">

<div *ngIf="showDetails">
    <h1>
        Invoice Date : {{date}}
    </h1>
    <h1>
        Invoice No : {{invoice}}
    </h1>
    <h1>
        Total : {{total}}
    </h1>
</div>

Modify the app component file. 

app.component.ts 

import { Component } from '@angular/core';

@Component({
  selector: 'app-root',
  templateUrl: './app.component.html',
  styleUrls: ['./app.component.css']
})
export class AppComponent {
  response!: {dbPath: ''};
  title = 'Angular16Upload';

  uploadFinished = (event:any) => { 
    this.response = event; 
  }

  onCreate = () => {
   
  }
}

Also change the app compoent template file.  

app.component.html 

<app-file-upload (onUploadFinished)="uploadFinished($event)"></app-file-upload>

We can add HttpClientModule in app module class as well. 

We can run the application now. 

For testing purposes, we have created two sample PDF files. 

In first Invoice date is April 19, 2023, and invoice number is 1 and total is $60.00 

In second invoice, date is April 24,2023 and invoice number is 2 and total is $100.00 

We can run Web API and Angular application together.  

Upload the first invoice file and let's see the result. 

Invoice date, number and total correctly fetched. 

Upload the second invoice file and let's see the result. 

Invoice date, number and total correctly fetched for second document as well.  

Conclusion 

In this post, we have seen how to extract values from PDF file without using AI algorithms.