The KMP algorithm searches for the occurrence of a word W in a sentence S while employing the observation that when a mismatch occurs W contains sufficient information to determine the start position of the next search in S.
Say W = abcd and S = abcbabcd. Now we compare W[0] with S[0], W[1] with S[1], W[2] with S[2] and they all match. But W[3] does not match S[3]. The brute force algorithm would have started the next search by comparing W[0] with S[1]. But a simple observation tells us that it would be fruitless as there is no W[0] till S[3]. We would have done better by starting the next search from S[4]. And this is exactly what the KMP tries to achieve.
KMP requires a pre-processing of the word W. We create a Partial Match Table by iterating over all the letters of W in such a way that in case of a mismatch we can quickly figure out the next start position by looking at the table. The wiki article on KMP describes with an example how to create the table:
The partial match table can be created in O(n) time using O(n) space, where n is the length of the word W. Once we have the table we can iterate over each element of S consulting the table every time a mismatch occurs. You may want to read up the algorithm with examples from
KMP – Algorithm With Examples – Wikipedia
The search happens in O(k) where k is the length of the sentence S. Thus the algorithm runs in O(n + k).
Below is my java implementation of the same:
/** * Perform the Knuth-Morris-Pratt string search. The algorithm first pre-processes the word to create a Partial Match * Table. The table is creatd in O(n) where n is the length of the word. After the pre-processing, the actual search can * be performed in O(k) where k is the size of the sentence. */public class KMPStringSearch { /** * Searches for all occurances of the word in the sentence. Runs in O(n+k) where n is the word length and k is the * sentence length. * * @param word The word that is being searched * @param sentence The collections of word over which the search happens. * @return The list of starting indices of the matched word in the sentence. Empty list in case of no match. */ public List<Integer> searchString(final String word, final String sentence) { final List<Integer> matchedIndices = new ArrayList<>();
final int sentenceLength = sentence.length(); final int wordLength = word.length(); int beginMatch = 0; // the starting position in sentence from which the match started int idxWord = 0; // the index of the character of the word that is being compared to a character in string final List<Integer> partialTable = createPartialMatchTable(word); while (beginMatch + idxWord < sentenceLength) if (word.charAt(idxWord) == sentence.charAt(beginMatch + idxWord)) { // the characters have matched if (idxWord == wordLength - 1) { // the word is complete. we have a match. matchedIndices.add(beginMatch); // restart the search beginMatch = beginMatch + idxWord - partialTable.get(idxWord); if (partialTable.get(idxWord) > -1) idxWord = partialTable.get(idxWord); else idxWord = 0; } else idxWord++; } else { // mismatch. restart the search. beginMatch = beginMatch + idxWord - partialTable.get(idxWord); if (partialTable.get(idxWord) > -1) idxWord = partialTable.get(idxWord); else idxWord = 0; }
return Collections.unmodifiableList(matchedIndices); }
/** * Creates the Partial Match Table for the word. Runs in O(n) where n is the length of the word. * * @param word The word whose Partial Match Table is required. * @return The table as a list of integers. */ public List<Integer> createPartialMatchTable(final String word) { if (StringUtils.isBlank(word)) return Collections.EMPTY_LIST;
final int length = word.length(); final List<Integer> partialTable = new ArrayList<>(length + 1); partialTable.add(-1); partialTable.add(0);
final char firstChar = word.charAt(0); for (int idx = 1; idx < word.length(); idx++) { final int prevVal = partialTable.get(idx); if (prevVal == 0) { if (word.charAt(idx) == firstChar) partialTable.add(1); else partialTable.add(0); } else if (word.charAt(idx) == word.charAt(prevVal)) partialTable.add(prevVal + 1); else partialTable.add(0); }
return Collections.unmodifiableList(partialTable); }}
Popularity: 3% [?]