Understanding String Algorithms
Introduction
String Algorithms are essential for solving problems related to text processing, searching, pattern matching, and data compression. These algorithms are widely used in:
Search engines
Text editors
DNA sequence analysis
Plagiarism detection
File compression
In this chapter, we will learn the most important string algorithms such as:
Naive Pattern Matching
KMP Algorithm
Rabin-Karp Algorithm
Z-Algorithm
Longest Common Prefix (LCP)
Longest Palindromic Substring
Understanding these algorithms will give you a strong foundation for solving real-world string-related problems.
1. Naive Pattern Matching
The simplest pattern matching technique checks for the pattern at every index of the string.
Example
Text: abcxabcdabcdabcy
Pattern: abcdabcy
Time Complexity
Worst: O(n × m)
C# Code
int NaiveSearch(string text, string pattern)
{
for (int i = 0; i <= text.Length - pattern.Length; i++)
{
int j;
for (j = 0; j < pattern.Length; j++)
{
if (text[i + j] != pattern[j]) break;
}
if (j == pattern.Length) return i;
}
return -1;
}
2. KMP Algorithm (Knuth–Morris–Pratt)
KMP improves over the naive method by avoiding re-checking previously matched characters.
It uses a LPS (Longest Prefix Suffix) array.
Why KMP?
Fast pattern matching
Avoids redundant comparisons
Time Complexity
O(n + m)
Steps
Build LPS array for the pattern
Use it to optimize search in text
C# Code (LPS Array)
int[] BuildLPS(string pattern)
{
int[] lps = new int[pattern.Length];
int len = 0;
for (int i = 1; i < pattern.Length;)
{
if (pattern[i] == pattern[len])
{
lps[i++] = ++len;
}
else if (len > 0)
{
len = lps[len - 1];
}
else
{
lps[i++] = 0;
}
}
return lps;
}
KMP Search
int KMPSearch(string text, string pattern)
{
int[] lps = BuildLPS(pattern);
int i = 0, j = 0;
while (i < text.Length)
{
if (text[i] == pattern[j])
{
i++; j++;
}
if (j == pattern.Length)
{
return i - j;
}
else if (i < text.Length && text[i] != pattern[j])
{
if (j != 0) j = lps[j - 1];
else i++;
}
}
return -1;
}
3. Rabin-Karp Algorithm
Rabin-Karp uses hashing to match patterns quickly.
Why Rabin-Karp?
Efficient for multiple pattern searches
Useful for plagiarism detection
Time Complexity
Average: O(n + m)
Worst: O(n × m)
C# Implementation
int RabinKarp(string text, string pattern)
{
int m = pattern.Length;
int n = text.Length;
long pHash = 0, tHash = 0, h = 1;
int d = 256;
int q = 101;
for (int i = 0; i < m - 1; i++) h = (h * d) % q;
for (int i = 0; i < m; i++)
{
pHash = (d * pHash + pattern[i]) % q;
tHash = (d * tHash + text[i]) % q;
}
for (int i = 0; i <= n - m; i++)
{
if (pHash == tHash)
{
if (text.Substring(i, m) == pattern) return i;
}
if (i < n - m)
{
tHash = (d * (tHash - text[i] * h) + text[i + m]) % q;
if (tHash < 0) tHash += q;
}
}
return -1;
}
4. Z-Algorithm
The Z-algorithm creates a Z-array where each entry represents the longest substring starting from index i that matches the prefix.
Why Z-Algorithm?
Fastest for pattern matching
Time Complexity: O(n + m)
Applications
String searching
Prefix computation
DNA sequencing
5. Longest Palindromic Substring
A palindrome reads the same forward and backward.
Example
Text: babad
Output: bab or aba
Expand Around Center Method
string LongestPalindrome(string s)
{
int start = 0, maxLen = 1;
for (int i = 0; i < s.Length; i++)
{
Expand(s, i, i);
Expand(s, i, i + 1);
}
void Expand(string str, int left, int right)
{
while (left >= 0 && right < str.Length && str[left] == str[right])
{
if (right - left + 1 > maxLen)
{
start = left;
maxLen = right - left + 1;
}
left--; right++;
}
}
return s.Substring(start, maxLen);
}
Time Complexity
O(n²)
6. Longest Common Prefix (LCP)
Given strings, find the longest prefix shared by all.
Example
Input: ["flower", "flow", "flight"]
Output: "fl"
Code
string LongestCommonPrefix(string[] strs)
{
string prefix = strs[0];
for (int i = 1; i < strs.Length; i++)
{
while (!strs[i].StartsWith(prefix))
{
prefix = prefix[..^1];
if (prefix == "") return "";
}
}
return prefix;
}
Summary
String algorithms solve essential problems like pattern matching, searching, and substring computations.
Key Takeaways
Naive Matching ? simple but slow
KMP ? prefix table + fast matching
Rabin-Karp ? hashing-based searching
Z-Algorithm ? fast prefix matching
Palindromic algorithms ? used in DNA and text analysis
LCP ? used in search engines and autocomplete systems