Explaination of Knuth–Morris–Pratt string-searching algorithm (or KMP).
Knuth–Morris–Pratt string-searching algorithm (or KMP algorithm) is a string search algorithm.
It’s named with the combined names of authors and first published at 1970.
KMP algorithm is elegant and efficient. Give any text and pattern, the overall time complexity is O(M+N), where M is the length of text and N is the length of pattern.
To understanding KMP, we need to explain with naive string search algorithm and prefix string.
Naive string search algorithm
Naive string search has two loops, we iterate each index from text and check each substring with pattern.
The overall time complexity is O(M*N).
|
The drawback of naïve algorithm is we fallback to the beginning of pattern every time. It will perform badly for such an input:
|
Each time we checked almost all the characters of pattern, but the last one is different, and then we restart the checking process from the beginning.
How to solve this issue?
Prefix and suffix string sets
To understand how to optimize naive string search, we need to know what is the common prefix and suffix substring.
Given any string, a prefix string set is a set contains all the prefix substring of it. For example, the pefix string set of “hello” is:
|
Suffix string set is similar but only contains suffix sub-string, for “hello”:
|
With these two sets, we could get the longest length of common string in two sets. For example:
|
How to use a common string of prefix and suffix substring to make string search quicker?
For each index of pattern, we build a table which contains the longest length of current substring.
For pattern “ccca”, the table is:
With text input text of “ccccccccca”, we can skip some part of pattern:
file:img/kmp-s1.gif
We even can skip multiple characters from pattern. Given the text of “ababababca”, let’s search the pattern of “abababca”:
file:img/kmp-s2.gif
KMP algorithm
With all above explaination, we can figure out KMP is similar with naive string search algorithm, but with an optimization of next
table. We use next
table to skip prefix parts of pattern:
|
Join my Email List for more insights, It's Free!😋