Questions about the KMP search algorithm

For discussions about programming, and for programming questions and advice


Moderator: Forum moderators

Post Reply
KalamyQ
Posts: 16
Joined: Wed Jul 13, 2022 10:59 am
Has thanked: 3 times

Questions about the KMP search algorithm

Post by KalamyQ »

I'm having trouble comprehending the KMP algorithm. I understand prefix-suffix and have written code to calculate the prefix-suffix table:

Code: Select all

 private int[] calculatePrefSuffArray(String pattern) {
    char patternArray[] = pattern.toCharArray();
    int tab[] = new int[pattern.length()];
    tab[0] = 0;
    int t = tab[0];
    for (int i = 1; i < pattern.length(); i++) { // t = tab[i-1]
        while(t >0 && patternArray[i] != patternArray[t] ) {
            t = tab[t-1];
        }
        if(patternArray[i] == patternArray[t]) t++;
        tab[i] = t;
    }
    return tab;
}

This is the algorithm outlined in the scaler topics. But I'm not sure how to utilize it in KMP. Could you please explain it to me?

Burunduk
Posts: 245
Joined: Thu Jun 16, 2022 6:16 pm
Has thanked: 6 times
Been thanked: 123 times

Re: KMP algorithm

Post by Burunduk »

Is it java? I don't understand java. Rewritten in python, it seems to work properly. So, you know how to get the lps table but don't know what to do with it next, do you?

The site you've linked to has a detailed explanation already. Basically all the fuss is about partial matches. For random strings the simple search can be quite good but for long series of identical characters the KMP algorithm is more efficient.

Let's rewrite the code from geeksforgeeks.org to see the difference:

Code: Select all

import sys

def lps(s):
  tab = [0] * len(s)
  t = tab[0]
  for i in range(1, len(s)):
    while t > 0 and s[i] != s[t]:
      t = tab[t-1]
    if s[i] == s[t]: t = t + 1
    tab[i] = t
  return tab

def simple_search(pattern, string):
  patlen = len(pattern)
  strlen = len(string)
  
  i = j = 0
  
  cnt = 0
  while i < strlen:
    cnt += 1
    if pattern[j] == string[i]:
      i += 1
      j += 1
      if j == patlen:
        print("pattern found at %i" % (i - j))
      else: continue
    i = i - j + 1 # may need to backtrack i here
    j = 0
  print("Simple: %i iteration(s)" % cnt)

def kmp_search(pattern, string):
  patlen = len(pattern)
  strlen = len(string)
  
  tab = lps(pattern)
  
  i = j = 0
  
  cnt = 0
  while i < strlen:
    cnt += 1
    if pattern[j] == string[i]:
      i += 1
      j += 1
      if j == patlen:
        print("pattern found at %i" % (i - j))
      else: continue
    if j != 0:
      j = tab[j-1] # use the table for j, don't backtrack i
    else:
      i += 1
  print("KMP:    %i iteration(s)" % cnt)

word = sys.argv[1]
text = sys.argv[2]
simple_search(word, text)
kmp_search(word, text)

The tests show that the KMP search needs fewer iterations to complete:

Code: Select all

root# python lps.py puppy "the puppy linux"
pattern found at 4
Simple: 21 iteration(s)
pattern found at 4
KMP:    15 iteration(s)
root# python lps.py aaaaa aaaaabaaaaaa
pattern found at 0
pattern found at 6
pattern found at 7
Simple: 34 iteration(s)
pattern found at 0
pattern found at 6
pattern found at 7
KMP:    16 iteration(s)
root# python lps.py AABA AABAACAADAABAABA
pattern found at 0
pattern found at 9
pattern found at 12
Simple: 34 iteration(s)
pattern found at 0
pattern found at 9
pattern found at 12
KMP:    20 iteration(s)
urin123
Posts: 1
Joined: Wed Nov 23, 2022 1:53 pm

Re: KMP algorithm

Post by urin123 »

Frankly, you need to understand the basic concept before jumping into the code details. Hope you have enough resources to go through the topic. It is an advanced algorithm. If you still have doubt, you can go through https://www.w3spot.com/2020/07/kmp-algo ... glish.html. It contains examples with identical characters & non-identical characters both. Once you grasp the concept, you can go to GeeksForGeeks & check the code implementation.

Post Reply

Return to “Programming”