Hello ASM Coders,

I have a question.

I need a very simple code to search a textstring "text" only in the first 2000 bytes (2 KB) at the begin
of file. If "text" found a jmp foundtext.

Thank you.

Fred
Posted on 2002-12-28 13:14:14 by Fred
	mov	edx, source

mov ecx, 2048
mov eax, "txet"
@@: cmp eax, dword ptr [edx]
je foundtext
inc edx
loop @B
notfound:
; display "text not found" message
jmp quit
foundtext:
; text found
jmp quit
Posted on 2002-12-28 13:29:56 by comrade
faster:


mov ecx, -2048+2
lea eax, [source+2048]
@A:
inc ecx
je @NotFound
cmp byte ptr [eax+ecx], "t"
jne @A
cmp byte ptr [eax+ecx-1], "x"
jne @A
cmp byte ptr [eax+ecx-2], "e"
jne @A
cmp byte ptr [eax+ecx-3], "t"
jne @A
;@Found:
;lea eax, [eax+ecx-3]
jmp quit
@NotFound:
; display "text not found" message
jmp quit
Posted on 2002-12-28 17:34:15 by lingo12
Here's anothr take...modified from comrade's version:
mov edx, source

mov ecx, 2048-4
mov eax, "txet"
@@:
cmp eax, dword ptr [edx+ecx]
je foundtext
dec ecx
jnz @B
notfound:
; display "text not found" message
jmp quit
foundtext:
; text found
jmp quit
Posted on 2002-12-28 20:18:47 by MArtial_Code
MArtial_Code,

your version is a bit faster due to substitution of slower LOOP (11 mops!!!) with dec ecx/jnz (2 mops),
but
"cmp eax, dword ptr " ; It is very bad (slow) because:

"On PPro, PII and PIII, misaligned data will cost you 6-12 clocks extra when a cache line boundary is crossed." by A.Fog

i.e. you will have a big penalty on every iteration.
I understand why you like to use dword rather then a byte,
but we must be rational rather than emotional.



Comrade,

I know it is off topic but I'm wondering why you finished with:
"Souz nerushimiy respublik svobodnih
Splotila naveki velikaya Rus
Da zdrastvuet sozdanniy volei narodov
Ediniy, moguchiy Sovietskiy Souz!"

I think the right way for the people from Eastern Europe
should be an United European States rather then "velikaya Rus"
or "velikaya Chechnya" or "velikaya Turkey" or "velikaya Kurdistan" or
"velikaya Yugoslavia" i tak dalee...Unfortunately, I can continue..

For me "Ediniy, moguchiy Sovietskiy Souz!" means "power to kill" people
with different "thinking", nationalities and religions in Russia and outside.

Here in Toronto live people from more than 70 nationalities and
they speak more than 100 different languages. Of course, they have
their communities and religions but they respect each other, because they are
free rather then "velikiy", and believe me, they live and work well together.

I respect you as an asm programmer but if you continue to write "lozungiy",
please translate them into English(if you can). It will be a great fun...

Regards,
Lingo
Posted on 2002-12-28 22:23:39 by lingo12
Just had a play with an alternative way to find the word being searched for.

This returns the match position in eax ready for the next iteration. You must increment the position by at least 1 before using it as the end parameter for the next search.



Find_Text proc lptext:DWORD,lntext:DWORD,startpos:DWORD

mov edx, lptext
mov eax, lntext
add edx, eax
neg eax
add eax, startpos

jmp @F

pre:
inc eax
@@:
cmp BYTE PTR [edx+eax], "t"
je sublp
inc eax
js @B

mov eax, -1 ; no match
ret

sublp:
cmp BYTE PTR [edx+eax+1], "e"
jne pre
cmp BYTE PTR [edx+eax+2], "x"
jne pre
cmp BYTE PTR [edx+eax+3], "t"
jne pre

add eax, lntext ; match
ret

Find_Text endp

Regards,

hutch@movsd.com
Posted on 2002-12-29 05:39:01 by hutch--
having looked at the posted code in this thread, Lingo12's fall through design has one less jump in the matching code so in a text that has many mismatches, his code wil be faster than the one I posted.

Regards,

hutch@movsd.com
Posted on 2002-12-29 17:33:08 by hutch--
Well if it's only used for a 2K range I'd suggest scasd.
Posted on 2002-12-29 18:35:51 by JimmyClif
Jimmy,

makes sense to me. I can only wonder how you benchmark 2k of text with these types of algos. :tongue:

Regards,

hutch@movsd.com
Posted on 2002-12-30 01:59:21 by hutch--
not tested but the idea here is to look for first t and the last t, if none of them matches, advance 4 bytes. No need to check for e and x if the last character is not t


mov eax, [color=red]memoryaddresshere[/color]
mov ecx, 2048
__check:
cmp BYTE PTR [eax+ecx], "t"
jne __next
cmp BYTE PTR [eax+ecx-3], "t"
jne __next
cmp BYTE PTR [eax+ecx-1], "e"
jne __next
cmp BYTE PTR [eax+ecx-2], "x"
jne __next
jmp __found
__next:
sub ecx, 4
jns __check

;Not Found

__found:
Posted on 2002-12-30 02:31:05 by stryker
ooops mistake... :grin: I don't know what I was thinking... :grin: forget the algo above...
Posted on 2002-12-30 02:41:05 by stryker
oops mistake again, the idea above works... it just need some tweaking... :grin:
    mov     eax, OFFSET txt

__check:
cmp BYTE PTR [eax], "t"
je __continue
inc eax
jmp __check
__continue:
cmp BYTE PTR [eax+3], "t"
jne __next
cmp BYTE PTR [eax+2], "x"
jne __next
cmp BYTE PTR [eax+1], "e"
jne __next
jmp __found
__next:
add eax, 4
cmp eax, SIZEOF txt
jb __check
jmp __exit
__found:

invoke MessageBox, 0, 0, 0, 0

__exit:
there that should work but I can't guarantee because I didn't test it... maybe someone will modify the one above to start counting at first 2048 bytes like the one I did above.... anyway, I have to go to rest... :grin:

one thing this algo will not work... try a trailing of t's like "tttext"

buggy code - forget it... :grin:
Posted on 2002-12-30 02:49:22 by stryker
I'm just trying to introduce the practical aspect here :grin:

Cheers,
Jimmy
Posted on 2002-12-30 07:51:12 by JimmyClif
JimmyClif,

"Well if it's only used for a 2K range I'd suggest scasd."

scasb OK, but scasd..!?

and he needs
"I need a very simple code..."
rather then suggestions


Regards,
Lingo
Posted on 2002-12-30 09:09:22 by lingo12
mov	esi, source

mov ecx, 2048 shr 2
mov eax, "txet"
repne scasd
je foundtext
; not found text
foundtext:
; found text


This will only work when "text" appears on 4-byte boundary, since scasd steps by 4 bytes.
Posted on 2002-12-30 11:22:11 by comrade
(haven't thought about the 4b boundary) <-- oops :grin:
Posted on 2002-12-30 15:34:16 by JimmyClif