On Obtaining Knuth, Morris, and Pratt's String Matcher by Partial Evaluation

We present the first formal proof that partial evaluation of a quadratic string matcher can yield the precise behaviour of Knuth, Morris, and Pratt's linear string matcher.

Obtaining a KMP-like string matcher is a canonical example of partial evaluation: starting from the naive, quadratic program checking whether a pattern occurs in a text, one ensures that backtracking can be performed at partial-evaluation time (a binding-time shift that yields a staged string matcher); specializing the resulting staged program yields residual programs that do not back up on the text, a la KMP. We are not aware, however, of any formal proof that partial evaluation of a staged string matcher precisely yields the KMP string matcher, or in fact any other specific string matcher.

In this article, we present a staged string matcher and we formally prove that it performs the same sequence of comparisons between pattern and text as the KMP string matcher. To this end, we operationally specify each of the programming languages in which the matchers are written, and we formalize each sequence of comparisons with a trace semantics. We also state the (mild) conditions under which specializing the staged string matcher with respect to a pattern string provably yields a specialized string matcher whose size is proportional to the length of this pattern string and whose time complexity is proportional to the length of the text string. Finally, we show how tabulating one of the functions in this staged string matcher gives rise to the `next' table of the original KMP algorithm.

The method scales for obtaining other linear string matchers, be they known or new

Abstract: