Look at the MB specs, I don't see how it's any faster than a normal search... if I have



This is only a test of the emergency broadcast system.
|
System


and only compare to match against the first byte and if there is a match move on the the second byte, in this case a space and a y don't match, so it would fail... would be no different than comparing the m to an o (in the same above example) and failing... then moving along until there is or isn't a match... then if I must return the position, I already know the starting position.

It seems to me that it's the same difference, whether or not I choose to match the first or the last letter...


Thanks,
_Shawn
Posted on 2001-11-06 09:41:05 by _Shawn
Shawn,
B/M *IS* lightning fast for exact string matches & where the setup overhead doesn't kill you. Of course I spend my life doing inexact string matches & tree builds blah blah... so it's more useful to me for document processing but it's a very pretty algo nevertheless... creative thinking to an old problem & presto-chango massive speedup. :) AND they came up with this approach pretty late in the game... OK before a lot of you were born :D

Comparing backward maximizes the amount you can skip forward by maximally capitalizing on mismatches. Mismatches are where the real info is... Comparing forward schemes don't do maximize the info gleaned from the mismatch & (if i remember correctly... Hutch?) you can only move forward upto the # chars already compared. In the example you've given you can only move forward one char but by comparing backward you can move the whole word length forward because you KNOW "o" isn't in "system".

This is really Hutch's baby so I'll defer to him on this... it's been soooo long since I've played with this bute.
Posted on 2001-11-06 15:18:31 by rafe
rafe, is certainly correct. If you start at the begining of the search string then you start comparing at the end of the match string, and visa-versa (if you start at the end of the search string then you start comparing the begining of the match string). Less comparisons means less branches, and less branching means faster execution. This is becoming more and more academic in software given good data modeling techniques. The majority of string comparisons are for short string, short matchs - which means the overhead of BM is too much.Posted on 2001-11-06 20:13:57 by bitRAKE
Shawn,

The original design by Bob Boyer and coded by Moore was a very clever design because it did the pattern matching in reverse. The win in doing this is if you find a mismatch, you can jump past it and start comparing again. Ther generic BM algo is better suited for long patterns as it gets faster as the pattern gets longer because it can make larger jumps.

I published 3 versions of it in the last APJ, a normal BM, a Horspool variation and a simplified variation and on average, they will rip the titz off a classic byte scanner. The original BM has the best averages with different paterns and different sources but the two variations have advantages on different hardware.

Overhead is equivelant to about 300 character matches and because of the design, the threshold for speed improvement is a pattern length of about 5 characters or larger. These factors will determine if you use a BM algo or not, on shorter patterns or small sources, a properly written byte scanner is faster.

On my PIII 600, I was getting about 250 meg/sec search speed on first pass, on this P4 I am setting up, its over 1 gig / sec. If you have to do repeated searching like virus patterns on the same file, the speed goes up considerably.

I have included the three algos in the next version of MASM32 so they should be of use to anyone who is searching large data sources for exact pattern matches.

Regards,

hutch@movsd.com
Posted on 2001-11-07 05:29:54 by hutch--