During the recent BM algo bashing, I decide to write my own string search algo. This is the ASM hybrid version of the Knuth-Pratt-Morris algorithm. I don't know if this is fast or slow. I just want to get away from the BM fuss that's been going around this forum. I'm not saying that this is fast, I'm just implementing an ASM hybrid version of the Knuth-Pratt-Morris algorithm. I also can't guarantee the quality of the algo, because I created this when the BM algo contest was starting. I tried every possible way to find a bug, but found none.




[size=9]
.586
.MODEL FLAT, STDCALL
OPTION SCOPED
OPTION CASEMAP:NONE

INCLUDE \masm32\include\kernel32.inc
INCLUDELIB \masm32\lib\kernel32.lib
INCLUDE \masm32\include\user32.inc
INCLUDELIB \masm32\lib\user32.lib

.DATA

TextString DB "A very very very very very very very very very very very very long string", 0
TextToFind DB "strin", 0

.CODE

Start:

xor eax, eax
xor ecx, ecx
xor edx, edx
lea esi, TextString
lea edi, TextToFind

@@ScanTextString:

mov dl, BYTE ptr [edi]
cmp dl, 0h
je @@ExitScanning
mov al, BYTE ptr [esi]
cmp al, 0h
je @@ExitScanning
cmp al, dl
jne @@ContinueScanning
inc esi
inc edi
mov ecx, 1
jmp @@ScanTextString
@@ContinueScanning:

xor ecx, ecx
lea edi, TextToFind
inc esi
jmp @@ScanTextString

@@ExitScanning:

or edx, edx
jne @@Exit
or ecx, ecx
je @@Exit

invoke MessageBox, 0, 0, 0, 0

@@Exit:
invoke ExitProcess, 0h

END Start
[/size]



I hope this will end the so called BM algo contest that's been in the forum, and I'm sick about it this "prove me wrong", "prove me right" ...

This program will popup a message box if it finds a match, else it just exit the program.


I hope this help!!!


At least we have an alternative!!! :)
Posted on 2002-02-26 01:15:59 by stryker
umberg6007,

Compliments on producing the KPM algo, I don't think from memory that it has the legs in as many situations as the generic BM but it is supposed to have some advantages searching binary files that do not have a huge range of difference.

I would be interested to see how it benchmarks on more specialised data that suits its design.

Regards,

hutch@movsd.com

Sorry about the kerfuffle on the BM algo family, all I did was publish one last year.
Posted on 2002-02-26 02:58:20 by hutch--
1.It is not KMP algo.
2.BM algo took many thing from KMP it was kind of evalution of KMP so they have much in common. For example it also has shift tbl.
3. F00der test indeed had subject to test very much for KMP.
(KMP is good when there is a lot simular word for start part)
Posted on 2002-02-26 09:04:51 by The Svin
That's why I call it hybrid cause it's a little slower than the original design :) but hey it's an alternative :) And yes it's the KPM algo(hybrid).

Credit goes to this site http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/kpm-example.html
Posted on 2002-02-26 09:25:16 by stryker
or even a faster one:




[size=9]
.586
.MODEL FLAT, STDCALL
OPTION SCOPED
OPTION CASEMAP:NONE

INCLUDE \masm32\include\kernel32.inc
INCLUDELIB \masm32\lib\kernel32.lib
INCLUDE \masm32\include\user32.inc
INCLUDELIB \masm32\lib\user32.lib

.DATA

TextString DB "A very very very very very very very very very very very very long string", 0
TextToFind DB "strin", 0

.CODE

Start:

xor eax, eax
xor ecx, ecx
xor edx, edx
lea esi, TextString
lea edi, TextToFind

@@ScanTextString:

mov dl, BYTE ptr [edi]
[b]or edx, edx[/b]
je @@ExitScanning
mov al, BYTE ptr [esi]
[b]or eax, eax[/b]
je @@ExitScanning
cmp al, dl
jne @@ContinueScanning
inc esi
inc edi
mov ecx, 1
jmp @@ScanTextString
@@ContinueScanning:

xor ecx, ecx
lea edi, TextToFind
inc esi
jmp @@ScanTextString

@@ExitScanning:

or edx, edx
jne @@Exit
or ecx, ecx
je @@Exit

invoke MessageBox, 0, 0, 0, 0

@@Exit:
invoke ExitProcess, 0h

END Start
[/size]

Posted on 2002-02-26 12:03:06 by stryker
Congratulations...
It is a good little search proc that is useable in many cases in term of speed...

The algo is pretty logic, and I like how much compact it is... and easy to follow and understand !

Thanks ! :alright:
Posted on 2002-02-26 12:42:57 by JCP
Thanks Readiosys!!! :alright:
Posted on 2002-02-26 12:47:02 by stryker
This is a optimized/step up version of the algo above. This one will give the byte order of your string, for example:



[size=9]
Actual Byte Order
[ 0][ 1][ 2][ 3][ 4][ 5][ 6][ 7][ 8][ 9][10][11][12][13][14][15][16][NA]

Byte String
[ H][ e][ l][ l][ o][ ][ C][ r][ u][ e][ l][ ][ W][ o][ r][ l][ d][0h]

Possible returned values(we need to subtract by 1)
[ 1][ 2][ 3][ 4][ 5][ 6][ 7][ 8][ 9][10][11][12][13][14][15][16][17][NA]
[/size]



If you go and search for the string "ue" it will return the value 9 in eax. You might be wondering why 9(One of the possible returned values) since string "ue" starts at 8(Actual Byte order), this is built in the algo that it automatically starts the counter at 1 and scans through the whole entire string source - 1. In the end, counter ends up 1 more than the actual string that was scanned. I'll leave this burden to the programmer to subtract 1 from the returned value. This is also to eliminate ambiguities since If a match is found at the first byte, eax might return a value 0, 0 here can either tell us that a match is found on the first byte of the string or we haven't found a match at all.



[size=9]
.586
.MODEL FLAT, STDCALL
OPTION SCOPED
OPTION CASEMAP:NONE

INCLUDE \masm32\include\kernel32.inc
INCLUDELIB \masm32\lib\kernel32.lib
INCLUDE \masm32\include\user32.inc
INCLUDELIB \masm32\lib\user32.lib
INCLUDE \masm32\include\masm32.inc
INCLUDELIB \masm32\lib\masm32.lib

.DATA

TextString DB "Hello Cruel World", 0
TextToFind DB "ue", 0
Result DB 9 DUP(0)
Buffer DB 20 DUP(0)
Found DB "String found at byte number: ", 0
NotFound DB "String not found.", 0

.CODE

KPMSearch PROC USES ebx esi edi srcStr:DWORD, destStr:DWORD

xor eax, eax
xor ebx, ebx
xor ecx, ecx
xor edx, edx
mov esi, srcStr
mov edi, destStr

@@KPMSCAN:

inc ecx
mov dl, BYTE PTR [edi]
or edx, edx
je @@KPMSTOPSCAN
mov al, BYTE PTR [esi+ecx-1]
or eax, eax
je @@KPMSTOPSCAN
cmp al, dl
jne @@KPMRESETSCAN
inc edi
or ebx, ebx
jnz @@KPMSCAN
mov ebx, ecx
jmp @@KPMSCAN

@@KPMRESETSCAN:

xor ebx, ebx
mov edi, destStr
jmp @@KPMSCAN

@@KPMSTOPSCAN:

or edx, edx
jne @@KPMEXIT
or eax, eax
je @@KPMEXIT
mov eax, ebx
ret

@@KPMEXIT:

xor eax, eax
ret

KPMSearch ENDP
Start:

invoke KPMSearch, OFFSET TextString, OFFSET TextToFind
or eax, eax
je @@Exit
push eax
invoke lstrcat, OFFSET Buffer, OFFSET Found
pop eax
dec eax
invoke dwtoa, eax, OFFSET Result
invoke lstrcat, OFFSET Buffer, OFFSET Result
invoke MessageBox, 0, OFFSET Buffer, 0, 0
invoke ExitProcess, 0h
@@Exit:
invoke MessageBox, 0, OFFSET NotFound, 0, 0
invoke ExitProcess, 0h

END Start
[/size]



One question though for optimizing gurus, which is faster inc esi or ... ?

Thanks!!!



Happy ASMing!!!
Posted on 2002-02-26 18:34:30 by stryker

One question though for optimizing gurus, which is faster inc esi or ...?
Not that I'm a guru, but the purpose of using the complex addressing modes is to reduce multiple instructions. Here is an example:
@@:

mov al, [esi]
inc esi
mov [edi], al
inc edi
dec ecx
jne @B

; becomes...

@@:
mov al,[esi+ecx]
nop ; do something useful here...
mov [edi+ecx], al
inc ecx
jne @B
This is a simple example, but you get the idea? :)
Posted on 2002-02-26 19:06:15 by bitRAKE
Thanks, rake. I thought using addressing... would slow down a routine. Thanks for clearing that up. :alright:

I'll change the routine above to a much more clearer code. It looks like a mess right now. :)
Posted on 2002-02-26 19:35:47 by stryker
umberg6007,

Something I found is that the clock is the best teacher, its worth playing with alternative instructions and order as it will often change the times for you, some faster and some slower but often useful in development.

Keep up the good work, a pleasure to see a new implementation of this algo.

Regards,

hutch@movsd.com
Posted on 2002-02-26 19:47:19 by hutch--
Thanks hutch-- :alright:
Posted on 2002-02-26 19:50:03 by stryker
Oh I missed something :: remove this line xor ebx, ebx you don't need this. Thus saving 1 cycle :)
Posted on 2002-02-26 23:41:33 by stryker
I wrote it on paper just to show what is KMP about.
Your code is not KMP and not hybrid of KMP.
I don't say it is bad it just have nothing to do with KMP
This kinda slappy code, I'm not free to submit optimized versions
hope you can figure out how to do it.


.data

p db 'abcdwij'
s db 'agfdfgfdabcddfdsasdfdsabcdwihabcdwijbcddfd'
.data?
d db 256 dup (?)
.code
mov edx,sizeof p -1
mov cl,-1 ;ecx = k
xor esi,esi
dec edx ;edx = len-1
mov d,cl

falsee:
;j:=j+1 ; k:=k+1
inc esi
inc cl
;IF p[j]==p[k] then d[j]:=d[k] else d[j]=k
mov al,p[esi]
cmp al,p[ecx]
mov al,cl ;al = k
jne @F ;p[j]<>p[k]
mov al,d[ecx]
@@: mov d[esi],al


;While j<M-1
;Begin_block1

cycl1: cmp esi,edx ;j<M-1 ?
jge exit_d
;While ((k>=0) &&(p[j]<>p[k] k:=d[k] ;esi=j
test cl,cl
js falsee
mov al,p[esi]
cmp al,p[ecx]
je falsee
mov cl,d[ecx]
jmp cycl1
exit_d:
;i=0 ; j=0 esi=lenD edx=lenP edi = i ecx = j
xor edi,edi
xor ecx,ecx
mov esi,sizeof s
inc edx
;While (j<M) &&(i<N) do
@extest:
cmp ecx,edx
jge @end
cmp edi,esi
jge @end
;While (j>=0) && (s[i]<>p[j]) j=d[j]

@s: mov al,s[edi]
cmp al,p[ecx]
je @inc
mov cl,d[ecx]
;j=j+1 i=i+1
@inc: inc edi
inc cl
jmp s
@end:
cmp ecx,edx
je @found
mov eax,-1
; ret
@found:
mov eax,edi
sub eax,edx
; ret
Posted on 2002-02-27 00:37:26 by The Svin
hmmm! Actually there are variations to it. But as for how KMP algorithm compares and processes strings, I think I have done it. Check this site too http://www-igm.univ-mlv.fr/~lecroq/string/node8.html this is how I got the idea. There's a visualization on how it works. But anyway I'll check the code, I'll take a look at that source and see what I can do.

If it's "not really KMP" then I'm sorry for the misundertanding. At least we have a new algo around and that's the most important. Should I call it uSearch algorithm? :)
Posted on 2002-02-27 00:52:21 by stryker
Included here is a screen shot from a java(everybody's going nuts over this "java" :) ) program that does this KPM/KMP algorithm. And I must say my code has been consistent on how KPM works, how it searches, scans, and compares bytes. If you have an idea or correction please tell me, I might have overlook something. Since the gif is way too big in dimensions, I just zipped it up.
Posted on 2002-02-27 01:40:54 by stryker
Ahh, one flaw on my program, if you have this search string hellodworld and you searched for d it doesn't continue to search all instances of d in the string. It stops on the first instance of d, if you have a program that does a searching routine you don't need to find all d instances of the string, all you need is the first instance and make the user picks his/her choice if he/she wants to continue searching for the second instance... So I need to refine my code a little bit more to support this kind of feature...I'll be coding up soon, I have to get some sleep!!! :) Yawn :)
Posted on 2002-02-27 02:07:08 by stryker
Uhm, I don't think such a feature is really needed since to search more occurences you just have to do

(string address + value returned by the last call of your searching proc), and recall your proc with this string address...

I'm not sure I'm very clear, but do you see what I mean ?

Or maybe you can specify the address to an array of dwords as a proc parameter to put the address of each found occurence... It may be faster than the first solution...
Posted on 2002-02-27 05:10:40 by JCP
umberg6007,

Thw way you have done it is usually the right way to write a search algo so if you want to search for a following occurrence of a string, either you set the offset of the file to the next position manually or ad an extra parameter so you can call it at the next position.

If you wanted to scan all occurrences in one pass to get the count, you would loop through the entire code incrementing a counter and return the counter value at the end.

Regards,

hutch@movsd.com
Posted on 2002-02-27 17:39:19 by hutch--
Thanks hutch, I already have the thing going on! I'll post the results in a few days after I'm finish commenting it.

Readiosys, you're right I don't need to add the features as I stated on earlier posts.

BTW, I ran a lot of tests, I found out that you really need the xor ebx, ebx sorry for the false alarm :)
Posted on 2002-02-27 17:46:21 by stryker