What I have available is both types, a brute force search and a Boyer Moore algo as it is useful to have both because they have overlapping capacity. I have adapted Jeremy Collake's version of a Boyer Moore to suit the MASM32 library and just got back his approval for it. The difference is in search string length and startup overhead, a boyer moore has to load a shift table which prevents it from being useful in recursive searching and it is not as fast on short search patterns. Testing the two algorithms I have, the pattern length overlap is about 10 - 12 characters. Below that, the brute force algo comes back faster most of the time, over that length, the boyer moore becomes progressively faster. The brute force one is more general purpose as it starts faster but if you need to search very long sections of data for long phrases or BYTE patterns, the boyer moore has a big performance advantage. Regards, email@example.com
I messed up: I meant Boyer-Moore algo doesn't do as many comparisons. :) The link that I added to the bottom of my message above is what you speak of.
Just so you know Hutch: Jeremy Collake's version of a Boyer Moore, is actually Horspool algorithm - which is a simplified variation. It really doesn't perform as well as the full implementation. I'm working out the details of an improved version of Boyer Moore: Turbo-BM. The code will be availible when I'm done. Boyer-Moore has even more overhead than Horspool algorithm. This message was edited by bitRAKE, on 4/11/2001 10:08:45 PM
Bitrake, Thanks for digging that one out, I know that Jeremy Collake's version is a port of the DOS example by Abrash and I knew it was doing a few things different but it certainly performs well. I have seen most of the stuff around on the net including the java demos but not much available in decent technical articles, I have not been able to find Bob Boyer's original paper and he can't either so the work I have done so far has been nutted out the hard way. I have most of it going but I don't know the technique used for the additional heuristics and thius leaves me with a few problems with repeat sequences like x!!!!!. Let us know when you get it going, it will be interesting to see how fast it is with the additional overhead. Regards, firstname.lastname@example.org
Steve, 1. I'm glad you found the way - how did you get the thought of alignment? What brought you to it? 2. I've looked at your InString proc and I think it's possible to make it 20-30% faster. Please, try to determine size of string while scaning. You don't need to make StrLen first - while searching for first byte of substr and while mutching strings check byte for searched bytes and for zero as control byte to finish in the same iteration. The Svin.
Alex, The InString proc basically needs a re-write and what I had in mind was using the new one and writing a wrapper to it as it is a fair bit faster than the search method in InString. I originally cut the search out of it and made it into a seperate proc but the ones we now have are a lot faster. I am not really in a mad rush to do it as the two available now can be used for text search with no problems, it just means writing the text adressing code first and then calling either the brute force algo or the Boyer More algo. Regards, email@example.com
Hutch, I just convert the C code and read every shred of information I can find. The part that the algorithm lacks is the good suffix skipping - everytime you match a character in the string that appears later in the string, and it's not a perfect match; the algorithm doesn't skip forward to that byte - it only skips forward on initial bad bytes. I just need more time to code. I never should have got another girlfriend. :P Just got laided off and the computer crashed last week! The hard drive with all my work is sitting on a stack of books until I have time to figure out what exactly went wrong, and salvage what I can! :( I want to go back to school next semester, too. :) Life is fun when you hit some friction. :)