Understanding String Algorithms

Introduction

String Algorithms are essential for solving problems related to text processing, searching, pattern matching, and data compression. These algorithms are widely used in:

  • Search engines

  • Text editors

  • DNA sequence analysis

  • Plagiarism detection

  • File compression

In this chapter, we will learn the most important string algorithms such as:

  • Naive Pattern Matching

  • KMP Algorithm

  • Rabin-Karp Algorithm

  • Z-Algorithm

  • Longest Common Prefix (LCP)

  • Longest Palindromic Substring

Understanding these algorithms will give you a strong foundation for solving real-world string-related problems.

1. Naive Pattern Matching

The simplest pattern matching technique checks for the pattern at every index of the string.

Example

Text: abcxabcdabcdabcy
Pattern: abcdabcy

Time Complexity

  • Worst: O(n × m)

C# Code

int NaiveSearch(string text, string pattern)
{
    for (int i = 0; i <= text.Length - pattern.Length; i++)
    {
        int j;
        for (j = 0; j < pattern.Length; j++)
        {
            if (text[i + j] != pattern[j]) break;
        }
        if (j == pattern.Length) return i;
    }
    return -1;
}

2. KMP Algorithm (Knuth–Morris–Pratt)

KMP improves over the naive method by avoiding re-checking previously matched characters.

It uses a LPS (Longest Prefix Suffix) array.

Why KMP?

  • Fast pattern matching

  • Avoids redundant comparisons

Time Complexity

  • O(n + m)

Steps

  1. Build LPS array for the pattern

  2. Use it to optimize search in text

C# Code (LPS Array)

int[] BuildLPS(string pattern)
{
    int[] lps = new int[pattern.Length];
    int len = 0;

    for (int i = 1; i < pattern.Length;)
    {
        if (pattern[i] == pattern[len])
        {
            lps[i++] = ++len;
        }
        else if (len > 0)
        {
            len = lps[len - 1];
        }
        else
        {
            lps[i++] = 0;
        }
    }
    return lps;
}

KMP Search

int KMPSearch(string text, string pattern)
{
    int[] lps = BuildLPS(pattern);
    int i = 0, j = 0;

    while (i < text.Length)
    {
        if (text[i] == pattern[j])
        {
            i++; j++;
        }
        if (j == pattern.Length)
        {
            return i - j;
        }
        else if (i < text.Length && text[i] != pattern[j])
        {
            if (j != 0) j = lps[j - 1];
            else i++;
        }
    }
    return -1;
}

3. Rabin-Karp Algorithm

Rabin-Karp uses hashing to match patterns quickly.

Why Rabin-Karp?

  • Efficient for multiple pattern searches

  • Useful for plagiarism detection

Time Complexity

  • Average: O(n + m)

  • Worst: O(n × m)

C# Implementation

int RabinKarp(string text, string pattern)
{
    int m = pattern.Length;
    int n = text.Length;
    long pHash = 0, tHash = 0, h = 1;
    int d = 256;
    int q = 101;

    for (int i = 0; i < m - 1; i++) h = (h * d) % q;

    for (int i = 0; i < m; i++)
    {
        pHash = (d * pHash + pattern[i]) % q;
        tHash = (d * tHash + text[i]) % q;
    }

    for (int i = 0; i <= n - m; i++)
    {
        if (pHash == tHash)
        {
            if (text.Substring(i, m) == pattern) return i;
        }

        if (i < n - m)
        {
            tHash = (d * (tHash - text[i] * h) + text[i + m]) % q;
            if (tHash < 0) tHash += q;
        }
    }
    return -1;
}

4. Z-Algorithm

The Z-algorithm creates a Z-array where each entry represents the longest substring starting from index i that matches the prefix.

Why Z-Algorithm?

  • Fastest for pattern matching

  • Time Complexity: O(n + m)

Applications

  • String searching

  • Prefix computation

  • DNA sequencing

5. Longest Palindromic Substring

A palindrome reads the same forward and backward.

Example

Text: babad
Output: bab or aba

Expand Around Center Method

string LongestPalindrome(string s)
{
    int start = 0, maxLen = 1;

    for (int i = 0; i < s.Length; i++)
    {
        Expand(s, i, i);
        Expand(s, i, i + 1);
    }

    void Expand(string str, int left, int right)
    {
        while (left >= 0 && right < str.Length && str[left] == str[right])
        {
            if (right - left + 1 > maxLen)
            {
                start = left;
                maxLen = right - left + 1;
            }
            left--; right++;
        }
    }

    return s.Substring(start, maxLen);
}

Time Complexity

  • O(n²)

6. Longest Common Prefix (LCP)

Given strings, find the longest prefix shared by all.

Example

Input: ["flower", "flow", "flight"]
Output: "fl"

Code

string LongestCommonPrefix(string[] strs)
{
    string prefix = strs[0];

    for (int i = 1; i < strs.Length; i++)
    {
        while (!strs[i].StartsWith(prefix))
        {
            prefix = prefix[..^1];
            if (prefix == "") return "";
        }
    }
    return prefix;
}

Summary

String algorithms solve essential problems like pattern matching, searching, and substring computations.

Key Takeaways

  • Naive Matching ? simple but slow

  • KMP ? prefix table + fast matching

  • Rabin-Karp ? hashing-based searching

  • Z-Algorithm ? fast prefix matching

  • Palindromic algorithms ? used in DNA and text analysis

  • LCP ? used in search engines and autocomplete systems