![]() |
|||||||||||
|
|
Rabin-Karp Algorithm Problem Description Given a string P,T. We want to check whether P is a substring of T. Algorithm RabinKarp algorithm performs well in practice and also generalized to other algorithm for related problems, such as two-dimensional pattern matching. The worst case running time is O((n-m+1)*m). Let's assume that the character used in string T is a set S { 0,1,2,...,9 }. We can view a string of k consecutive character as decimal numbers. For example, string "31415" corresponds to the number 31415. Given a pattern P[1..m], let p denote its corresponding decimal value. Given a text T[1..n], we let t(s) denote the decimal value of length-m substring T[s+1..s+m] for s = 0,1,...,n-m. Obviously, t(s) = p if and only if T[s+1..s+m] = P[1..m]. We can p compute in O(m) time using Horner's Rule
pseudo code for above assumption is
We can compute all t(s), for s = 0,1,...,n-m values in a total of O(n) time. The value of t(0) can be similarly computed from T[1..m] in O(m) time. To compute the remaining values t(1), t(2), ..., t (n-m), observe that t(s+1) can be computed can be computed from t(s) in constant time.
For example, T = "123456" and m = 3 t(0) = 123 t(1) = 10*(123 - 100*1) + 4 = 234 Step by Step explanation First : remove the first digit : 123 - 100*1 = 23 Second: Multiply by 10 to shift it : 23 * 10 = 230 Third : Add last digit : 230 + 4 = 234 The algorithm runs by comparing, t(s) with p. When t(s) = p, then we have found the substring P in T, starting from position s. Generalization The only problem with above explanation is t(s) and p may be too large, so that no built-in data type can fit them. The solution is we required all t(s) and p be performed in modulo q. In general, if we have d-ary alphabet { 0,1,...,d-1 }, we could have 26 alphabet { a, b, ..., z }, we choose q so that dq fits within a computer word. We can work out p modulo q by using this pseudo code
and computation of t(s+1) become
The weakness of this method is that, ts = p (mod q) doesn't imply that ts = p. On the other hand, if ts != p (mod q), we definitely know that ts != p. So we can use ts != p (mod q) as a fast heuristic test to rule out invalid shifts s. But if we have ts = p we have have to check whether T[s+1...s+m] = P[1..m] Implementation in C/C++
|
||||||||||
| Copyright ©
2004 - Harvest Software |
|||||||||||