Boyer-Moore Search Algorithm

Question

Paul Huter am 21 Nov. 2012

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/54309-boyer-moore-search-algorithm

Beantwortet: Sunny Choudhary am 10 Jul. 2023

Does anyone know of an accurate, easy-to-use source for a Boyer-Moore algorithm? Preferably something I can code up in MATLAB, or actual MATLAB code/function.

Thanks.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Sunny Choudhary am 10 Jul. 2023

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/54309-boyer-moore-search-algorithm#answer_1270583

Here is a good source of a Boyer-Moore search algorithm

Main features

Performs the comparisons from right to left
Pre-processing phase in O ( m + n) time and space complexity
Searching phase in O ( mn ) time complexity
3n text character comparisons in the worst case
O(n / m) best performance

Description

It is considered as the most efficient string matching algorithm. A simplified version of it or the entire algorithm is used in text editors for search and substitute commands.

The algorithm scans the characters of the pattern from right to left beginning with the rightmost one. In case of a mismatch (or a complete match of the whole pattern) it uses two pre-computed functions to shift the window to the right.

The two shifts functions are

good suffix shift or matching shift : aligns only matching pattern characters against target characters already successfully matched.
bad character shift or occurrence shift : avoids repeating unsuccessful comparisons against a target character.

Assume that a mismatch occurs between the character x [ i ] = a of the pattern and the character y [ i + j ] = b of the text during an attempt at position j. Then, x [ i + 1 ... m -1 ] = y [ i + j + 1 ... j + m - 1 ] = u and x [ i ] not equals to y [ i + j ]. The good-suffix shift consists in aligning the segment y [ i + j + 1 ... j + m - 1 ] = x [ i + 1 ... m - 1 ] with its rightmost occurrence in x that is preceded by a character different from x [ i ] as shown in the fig 1.

fig 1 : The good-suffix shift, u re-occurs preceded by a character c different from a.

If there exists no such segment, the shift consists in aligning the longest suffix v of y [ i + j + 1 ... j + m - 1 ] with a matching prefix of x as shown in fig 2.

fig 2 : The good-suffix shift, only a suffix of u re-occurs in x.

The bad-character shift consists in aligning the text character y [ i + j ] with its rightmost occurrence in x [ 0 ... m - 2 ] as shown in fig 3.

fig 3 : The bad-character shift, a occurs in x.

If y [ i + j ] does not occur in the pattern x, no occurrence of x in y can include y [ i + j ], and the left end of the window is aligned with the character immediately after y [ i + j ], namely y [ i + j + 1 ] as shown in fig 4.

fig 4 : The bad-character shift, b does not occur in x.

Note that the bad-character shift can be negative, thus for shifting the window, the Boyer-Moore algorithm applies the maximum between the the good-suffix shift and bad-character shift. More formally the two shift functions are defined as follows.

The good-suffix shift function is stored in a table bmGs of size m+1.

Lets define two conditions :

Cs ( i , s ) : for each k such that i < k < m , s >= k or x [ k - s] = x [ k ] and
Co ( i , s ) : if s < i then x [ i - s ] not equals to x [ i ]

Then, for 0 <= i < m : bmGs [ i + 1 ] = min { s > 0 : Cs ( i , s ) and Co ( i , s ) hold }

and we define bmGs [ 0 ] as the length of the period of x. The computation of the table bmGs use a table suff defined as follows : 1 <= i < m , suff [ i ] = max { k : x [ i - k + 1 ......... i ] = x [ m - k ............. m - 1 ] }

The bad-character shift function is stored in a table bmBc of size . For c in : bmBc [ c ] = min { i : 1 <= i < m-1 and x [ m - 1 - i ] = c } if c occurs in x, m otherwise.

Tables bmBc and bmGs can be precomputed in time O( m + ) before the searching phase and require an extra-space in O( m + ). The searching phase time complexity is quadratic but at most 3n text character comparisons are performed when searching for a non periodic pattern. On large alphabets (relatively to the length of the pattern) the algorithm is extremely fast.

Note :

Boyer-Moore algorithm is very fast on large alphabet ( relative to the length of the pattern).
Not applicable for small alphabet ( relative to the length of the pattern ).
For binary strings KMP algorithm is recommended.
For the very shortest pattern, the brute force algorithm may be better.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Boyer-Moore Search Algorithm

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

Boyer-Moore Search Algorithm

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden