Assuming that I only use UPPERCASE strings. What would be the fastest algoritm to search a list of lets say 5000 upercased strings for an exact match??

any links would be appreciated.
Posted on 2001-12-13 13:00:15 by dxantos
Boyer Moore ?

Hutch did some very good ASM implementations of it.

http://www.movsd.com/bm.htm
Posted on 2001-12-13 13:25:00 by JCP
boyer moore??? what about a simple cmpsb...



.DATA
str1 db "tEST1",0
str2 db "TEST1",0

cap db "example",0
msg db "not equal",0

.CODE
start: mov eax, offset msg

mov edi, offset str1
mov esi, offset str2
mov ecx, 5
repz cmpsb

jnz @F

add eax, 4

@@: invoke MessageBox,0,eax,addr cap,0
Posted on 2001-12-13 14:58:38 by mob
If your goal is speed then no cmpsb - use BM.
Posted on 2001-12-13 15:57:13 by bitRAKE
First let me say that BM is flat out fast fast fast. But it's a tool that's best in a niche. That niche is for an exact string searchs with a single pattern on a large-ish file.

Problem here is that if you're looking for many patterns against big text you can sometimes do a bit better using a FSM variant of Knuth-Morris-Pratt rather than passing over the file many times. Yes KMP is slower if your a doing single pattern search but in general the benefits of the extra BM skips will diminish as you add more patterns to the mix. A properly designed KMP-FSM will only have to pass over the file once & you can make it so that you only look at any text char in the file being searched no more than once.

Damn! someone at work walked off with the library book on string searching algos! I'll have to get back to you on the details when I can look at my own books. I believe(?) that you can do this with a BM-FSM but that there are no benefits for the added overhead. But I'd better look it up & be sure here.
Posted on 2001-12-13 17:02:38 by rafe
hm i never used one of those algoritms... i looked
at the source it seemed very big to me so i can't
anylyse or understand it in 5 minutes but ~why~ is
it faster? i don't know if you mean true pattern matching
or simple fixed string matching? whats the technique
behind this method?
Posted on 2001-12-13 17:44:35 by mob
oops sorry i did'nt saw hutch's explanation... will
read it now and hope i'm able to understand it :)
Posted on 2001-12-13 17:49:11 by mob
use bm :)

-but only if the list of strings is linked togehter in
something like a buffer or file... if not you know
that the base to search will always be 0 so a
simple loop would be better (btw yeah cmpsb
is slower than cmp...)
Posted on 2001-12-14 01:21:44 by mob