Jayeskumar M
What is KMP Algorithm?

What is KMP Algorithm for pattern matching?

By Jayeskumar M in .NET on Jul 11 2020
  • Guest User
    Jul, 2022 21

    I see the link I have posted was already shared. Good to know others also found it useful!

    • 1
  • Guest User
    Jul, 2022 21

    I see the link I have posted was already shared. Good to know others also found it useful!

    • 0
  • Guest User
    Jul, 2022 21

    KMP is an advanced pattern matching algorithm. I have seen people getting confused while learning it. Here is an article about KMP algorithm which explains the concept & time complexity in simple terms. I mean it is simpler than most other KMP resources :)

    • 0
  • hantoniv kano
    Aug, 2021 20

    KMP is an advanced algorithm & lot of resources are present in the internet. I would recommend https://www.w3spot.com/2020/07/kmp-algorithm-explained-in-plain-english.html as a primer. It can be as simple as it gets if you follow through the example there.

    • 0
  • Archana Parmar
    Sep, 2020 24

    KMP
    (Knuth Morris Pratt) Pattern Searching
    The Naive pattern searching algorithm doesn’t work well in cases where we see many matching characters followed by a mismatching character. Following are some examples.

    txt[] = “AAAAAAAAAAAAAAAAAB”
    pat[] = “AAAAB”

    txt[] = “ABABABCABABABCABABABC”
    pat[] = “ABABAC” (not a worst case, but a bad case for Naive)
    The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more than once in the pattern) of the pattern and improves the worst case complexity to O(n). The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know some of the characters in the text of the next window. We take advantage of this information to avoid matching the characters that we know will anyway match. Let us consider below example to understand this.

    Matching Overview
    txt = “AAAAABAAABA”
    pat = “AAAA”

    We compare first window of txt with pat
    txt = “AAAAABAAABA”
    pat = “AAAA” [Initial position]
    We find a match. This is same as Naive String Matching.

    In the next step, we compare next window of txt with pat.
    txt = “AAAAABAAABA”
    pat = “AAAA” [Pattern shifted one position]
    This is where KMP does optimization over Naive. In this second window, we only compare fourth A of pattern with fourth character of current window of text to decide whether current window matches or not. Since we know first three characters will anyway match, we skipped matching first three characters.

    Need of Preprocessing?
    An important question arises from the above explanation, how to know how many characters to be skipped. To know this, we pre-process pattern and prepare an integer array lps[] that tells us the count of characters to be skipped.

    1. //program can be convert in c#
    2. // C++ program for implementation of KMP pattern searching
    3. // algorithm
    4. void computeLPSArray(char* pat, int M, int* lps);
    5. // Prints occurrences of txt[] in pat[]
    6. void KMPSearch(char* pat, char* txt)
    7. {
    8. int M = strlen(pat);
    9. int N = strlen(txt);
    10. // create lps[] that will hold the longest prefix suffix
    11. // values for pattern
    12. int lps[M];
    13. // Preprocess the pattern (calculate lps[] array)
    14. computeLPSArray(pat, M, lps);
    15. int i = 0; // index for txt[]
    16. int j = 0; // index for pat[]
    17. while (i <; N) {
    18. if (pat[j] == txt[i]) {
    19. j++;
    20. i++;
    21. }
    22. if (j == M) {
    23. printf("Found pattern at index %d ", i - j);
    24. j = lps[j - 1];
    25. }
    26. // mismatch after j matches
    27. else if (i <; N && pat[j] != txt[i]) {
    28. // Do not match lps[0..lps[j-1]] characters,
    29. // they will match anyway
    30. if (j != 0)
    31. j = lps[j - 1];
    32. else
    33. i = i + 1;
    34. }
    35. }
    36. }
    37. // Fills lps[] for given patttern pat[0..M-1]
    38. void computeLPSArray(char* pat, int M, int* lps)
    39. {
    40. // length of the previous longest prefix suffix
    41. int len = 0;
    42. lps[0] = 0; // lps[0] is always 0
    43. // the loop calculates lps[i] for i = 1 to M-1
    44. int i = 1;
    45. while (i <; M) {
    46. if (pat[i] == pat[len]) {
    47. len++;
    48. lps[i] = len;
    49. i++;
    50. }
    51. else // (pat[i] != pat[len])
    52. {
    53. // This is tricky. Consider the example.
    54. // AAACAAAA and i = 7. The idea is similar
    55. // to search step.
    56. if (len != 0) {
    57. len = lps[len - 1];
    58. // Also, note that we do not increment
    59. // i here
    60. }
    61. else // if (len == 0)
    62. {
    63. lps[i] = 0;
    64. i++;
    65. }
    66. }
    67. }
    68. }
    69. // Driver program to test above function
    70. int main()
    71. {
    72. char txt[] = "ABABDABACDABABCABAB";
    73. char pat[] = "ABABCABAB";
    74. KMPSearch(pat, txt);
    75. return 0;
    76. }

    Output:
    Found pattern at index 10

    //Whats your opinion ??

    • 0
  • Roshan Rathod
    Jul, 2020 22

    Knuth Morris Pratt (KMP) is an algorithm, which checks the characters from left to right. When a pattern has a sub-pattern appears more than one in the sub-pattern, it uses that property to improve the time complexity, also for in the worst case.

    • 0


Most Popular Job Functions


MOST LIKED QUESTIONS