How to Search Text in PDF Files on OneDrive Using Telerik

In this article, I will demonstrate how to use Telerik FormatProviders Pdf, Text, and RadGridView components, to group column results.

To demonstrate, I built an App for .NET 7 WinForms that works and is functional for C# 11, which you can download and use with a Progress Telerik license of UI for WinForms.

This solution uses the Default theme for WinForms. Let’s take a look.

The user can specify a pattern search like “MyFiles*.pdf”; Limit the max result files and the text to search inside Pdf text content.

Search Text in PDF Files on OneDrive Using Telerik

The Code

To get all text from the Pdf file, you can use this command:

var provider = new PdfFormatProvider();
using Stream stream = File.OpenRead(cFileName);
var document = provider.Import(stream);
var textFormatProvider = new TextFormatProvider();
var result = textFormatProvider.Export(document);

The path from OneDrive we get from Windows environment variables:

var oneDrivePath = Environment.GetEnvironmentVariable("OneDrive");
if (!Directory.Exists(oneDrivePath))
{
    throw new Exception("OneDrive directory not found!");
}

On the demonstration app, I added some extra code.

It will load in a background thread and limit the time to 5 seconds (you can use up to 30 seconds to work better on slow PCs).

Some PDF files take too long due to their configuration on creation, so I added a try-catch, and the search will ignore encrypted files too.

public static string ReadTextFromPdf(string cFileName)
{
    const int PMaxTimeInSecondsToWait = 5;
    var result = "";
    var th = new Thread(() =>
    {
        try
        {
            var provider = new PdfFormatProvider();
            using Stream stream = File.OpenRead(cFileName);
            var document = provider.Import(stream);
            var textFormatProvider = new TextFormatProvider();
            result = textFormatProvider.Export(document);
        }
        catch
        {
            // Unexpected errors are ignored
            // Documents with Passwords are not open!
        }
    })
    {
        IsBackground = true
    };
    th.Start();
    var startDtTime = DateTime.Now.AddSeconds(PMaxTimeInSecondsToWait);
    while (th.IsAlive || result.Length == 0)
    {
        if (DateTime.Now > startDtTime)
        {
            th.Interrupt();
            return result;
        }
        System.Windows.Forms.Application.DoEvents();
    }
    return result;
}

This is the primary method of the search engine; let’s analyze it:                

public static List<FileInfo>? SearchAllPdfFilesInOneDrive(
        RadMain frm, 
        RadLabel lbl, 
        RadGridView grid, 
        string pattern, 
        string searchText, 
        int maxFiles = 50)
{
    grid.AllowAddNewRow = true;
    var startTime = DateTime.Now;
    try
    {
        frm.Text = $@"{AppUtil.PFormTitle} | Searching...";
        Application.DoEvents();

        var dsFiles = new List<DsFiles>();
        searchText = searchText.ToUpper();

        var oneDrivePath = Environment.GetEnvironmentVariable("OneDrive");
        if (!Directory.Exists(oneDrivePath))
        {
            throw new Exception("OneDrive directory not found!");
        }

        var pdfFiles = ReadAllFiles(oneDrivePath, pattern)
            .OrderByDescending(t=> new FileInfo(t).LastWriteTime).ToList();

        frm.Text = $@"{AppUtil.PFormTitle} | Files found 0/0/{pdfFiles.Count}...";
        Application.DoEvents();
        var result = new List<FileInfo>();
        var nCurrent = 0;

        foreach (var file in pdfFiles)
        {
            nCurrent++;
            if (frm.IsDisposed || frm.ExitSearch)
            {
                return null;
            }

            frm.Text = $@"{AppUtil.PFormTitle} | Files found {result.Count}/{nCurrent}/{pdfFiles.Count} - {(DateTime.Now - startTime):dd\.hh\:mm\:ss}...";
            lbl.Text = $@"Searchin text in: {new FileInfo(file).Name}";
            Application.DoEvents();

            if (!PdfTools.ReadTextFromPdf(file).ToUpper().Contains(searchText)) continue;

            var fileInfo = new FileInfo(file);
            result.Add(fileInfo);
            var item = new DsFiles()
            {
                FileName = fileInfo.Name,
                Year = fileInfo.LastWriteTime.Year,
                Month = fileInfo.LastWriteTime.Month,
                LastModified = fileInfo.LastWriteTime,
                FullName = file,
                CreationTime = fileInfo.CreationTime,
                Path = fileInfo.DirectoryName
            };

            dsFiles.Add(item);

            if (grid.IsDisposed) return null;
            if (dsFiles.Count == 1)
            {
                grid.DataSource = dsFiles;
                grid.Enabled = true;
            }
            else
            {
                var row = new DataGridViewRow();
                grid.Rows.Add(
                    item.FileName,
                    item.Year,
                    item.Month,
                    item.LastModified,
                    item.FullName,
                    item.CreationTime,
                    item.Path);
            }

            frm.Text = $@"{PFormTitle} | Files found {result.Count}/{pdfFiles.Count}...";
            Application.DoEvents();

            if (result.Count >= maxFiles) return result;
        }

        return result;
    }
    catch (Exception ex)
    {
        RadMessageBox.Show(ex.Message,"Error", MessageBoxButtons.OK, RadMessageIcon.Error, detailsText: ex.StackTrace);
        return null;
    }
}

The method gets a list of all files in OneDrive and sorts by the last modified time stamp:

var pdfFiles = ReadAllFiles(oneDrivePath, pattern)
    .OrderByDescending(t=> new FileInfo(t).LastWriteTime).ToList();

The method uses the ToUpper() method in the search string for an efficient result:

if (!PdfTools.ReadTextFromPdf(file).ToUpper().Contains(searchText)) 
continue;

On the RadMain form we added a public Boolean property ExitSearch, to stop the process:

private void RadMain_FormClosing(object sender, FormClosingEventArgs e)
{
    ExitSearch = true;
}

And on the search method, we test if the Form was disposed of or the FormClosing was fired:

if (frm.IsDisposed || frm.ExitSearch)
{
    return null;
}

The result is this:

Search Text in PDF Files on OneDrive Using Telerik

In this hypothetical sample, the user grouped the columns Path and Year.

The text search in Pdfs can be used on other platforms, like UI for ASP.NET Core or UI for Blazor.

You can download a trial from Progress Telerik DevCraft from

https://www.telerik.com/try/devcraft-ultimate.

Adding your Licensed Telerik to this Solution

1. Open NuGet from Solution Explorer

Search Text in PDF Files on OneDrive Using Telerik

2. Select Telerik

Search Text in PDF Files on OneDrive Using Telerik

3. Search for "70", and install UI.ForWinforms.AllControls.Net70

Search Text in PDF Files on OneDrive Using Telerik

And that is it! Ready to run!

Search Text in PDF Files on OneDrive Using Telerik

Have a nice coding!


Similar Articles