KMP 算法在查找字符串中所有出现的模式时具有线性复杂度,如 Boyer-Moore 算法1。如果您尝试在“aaaaaaaaa”这样的字符串中查找“aaaaaa”这样的模式,一旦获得第一个完整匹配,
aaaaaaaaa
aaaaaa
aaaaaa
^
the border table contains the information that the next longest possible match (corresponding to the widest border of the pattern) of a prefix of the pattern is just one character short (a complete match is equivalent to a mismatch one past the end of the pattern in this respect). Thus the pattern is moved one place further, and since from the border table it is known that all characters of the pattern except possibly the last match, the next comparison is between the last pattern character and the aligned text character. In this particular case (find occurrences of am in an), which is the worst case for the naive matching algorithm, the KMP algorithm compares each text character exactly once.
在每个步骤中,至少有一个
- 比较文本字符的位置
- 模式的第一个字符相对于文本的位置
增加,并且永远不会减少。比较的文本字符的位置最多可以增加length(text)-1
次,第一个模式字符的位置最多可以增加length(text) - length(pattern)
次,所以该算法最多需要2*length(text) - length(pattern) - 1
steps.
预处理(边界表的构建)最多需要2*length(pattern)
步骤,因此整体复杂度为 O(m+n),仅此而已m + 2*n
步骤被执行,如果m
是图案的长度,n
文本的长度。
¹ Note that the Boyer-Moore algorithm as commonly presented has a worst-case complexity of O(m*n) for periodic patterns and texts like am and an if all matches are required, because after a complete match,
aaaaaaaaa
aaaaaa
aaaaaa
^
<- <-
^
整个模式将被重新比较。为了避免这种情况,您需要记住在完全匹配之后的移位之后模式的前缀仍然匹配多长时间,并且只比较新字符。