Seems something failed in the testing of the last version. Here is a version that I have modified that uses a design by EKO where the comparison loop is run backwards. The advantage of his design is the lower branch overhead when a first character match is found.

It has a 4 insruction long main loop which is where the speed is, a 2 instruction pre match count thanks to EKO and a 5 instruction long match loop so generally the loop code is efficient enough.

The prologue code can be made smaller using a design by Alex (The Svin) but I have not had time to do it. The real speed is in the loop code so its not a performance problem.

This version re-reads the 1st character in the matching loop so that it works properly on a single character. For patterns longer than 1 character, you can start on the next character but it is common when working with ascii text to search for 1 character so the extra iteration is necessary.

Regards,

hutch@movsd.com



; #########################################################################

InString proc startpos:DWORD,lpSource:DWORD,lpPattern:DWORD

; ------------------------------------------------------------------
; InString searches for a substring in a larger string and if it is
; found, it returns its position in eax.
;
; It uses a one (1) based character index (1st character is 1,
; 2nd is 2 etc...) for both the "StartPos" parameter and the returned
; character position.
;
; Return Values.
; If the function succeeds, it returns the 1 based index of the start
; of the substring.
; 0 = no match found
; -1 = substring same length or longer than main string
; -2 = "StartPos" parameter out of range (less than 1 or longer than
; main string)
; ------------------------------------------------------------------

LOCAL sLen:DWORD
LOCAL pLen:DWORD

push ebx
push esi
push edi

invoke StrLen,lpSource
mov sLen, eax ; source length
invoke StrLen,lpPattern
mov pLen, eax ; pattern length

cmp startpos, 1
jge @F
mov eax, -2
jmp isOut ; exit if startpos not 1 or greater
@@:

dec startpos ; correct from 1 to 0 based index

cmp eax, sLen
jl @F
mov eax, -1
jmp isOut ; exit if pattern longer than source
@@:

sub sLen, eax ; don't read past string end
inc sLen

mov ecx, sLen
cmp ecx, startpos
jg @F
mov eax, -2
jmp isOut ; exit if startpos is past end
@@:

; ----------------
; setup loop code
; ----------------
mov esi, lpSource
mov edi, lpPattern
mov al, [edi] ; get 1st char in pattern

add esi, ecx ; add source length
neg ecx ; invert sign
add ecx, startpos ; add starting offset

jmp Scan_Loop

align 16

; @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Pre_Scan:
inc ecx ; start on next byte

Scan_Loop:
cmp al, [esi+ecx] ; scan for 1st byte of pattern
je Pre_Match ; test if it matches
inc ecx
jnz Scan_Loop

jmp No_Match

Pre_Match:
lea ebx, [esi+ecx] ; put current scan address in EBX
mov edx, pLen ; put pattern length into EDX

Test_Match:
mov ah, [ebx+edx-1] ; load last byte of pattern length in main string
cmp ah, [edi+edx-1] ; compare it with last byte in pattern
jne Pre_Scan ; jump back on mismatch
dec edx
jnz Test_Match ; 0 = match, fall through on match

; @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Match:
add ecx, sLen
mov eax, ecx
inc eax
jmp isOut

No_Match:
xor eax, eax

isOut:
pop edi
pop esi
pop ebx

ret

InString endp

; ########################################################################
Posted on 2002-06-30 03:09:25 by hutch--
Hutch,

I cut and pasted, but I get an error in assembling.

.386
.model flat, stdcall ; 32 bit memory model
option casemap :none ; case sensitive

align 16
c:\masm32\m32lib\instring.asm(81) : error A2189: invalid combination with segment alignment : 16

Is there a reason for this, that I am missing?

Enjoy your work, P1

PS: InString is intermittantly throwing GPF's. I was hoping this revised one would help.
Posted on 2002-07-26 12:17:15 by Pone
Pone,

Change the processor model to 486 or higher to use align 16.

I have done extensive testing on this algo as I had to debug an unusual condition with the last one buit you must make sure that the memory locations of both the source and the destination are valid and the length must not exceed the actual length of the source.

Regards,

hutch@movsd.com
Posted on 2002-07-27 07:19:32 by hutch--
hutch i think you have mistake in the code





Pre_Scan:
inc ecx ; start on next byte

Scan_Loop:
cmp al, [esi+ecx]
je Pre_Match
inc ecx
jnz Scan_Loop

jmp No_Match

Pre_Match:
lea ebx, [esi+ecx]
mov edx, pLen
Test_Match:
mov ah, [ebx+edx-1] cmp ah, [edi+edx-1]
jne Pre_Scan ;------> here . you jmp to pre scan and in Pre_Scan you inc ecx and dont check for zero


bye

eko
Posted on 2002-07-27 10:30:18 by eko
hiiiiiiiiiiiiiiiiii

i had few more ideas to instring .
and here it now
3 version of in string .
1. case sensitive - i improved it alittle more than the last time
2. case insensitive
3. whole word
4. whole word 2nd version . using faster way to check if its a char
bye

eko

p.s
if you need more speed . and you know the length of your string and pattern,loss the strlen function and set two more parameters to the function
stringlen,patternln

EDIT : update 4 files .. look up
Posted on 2002-07-27 10:32:27 by eko
The code that tests for zero is well before the main scan loop.


invoke StrLen,lpSource
mov sLen, eax ; source length
invoke StrLen,lpPattern
mov pLen, eax ; pattern length


It exits the scan loop when the counter = zero



Scan_Loop:
cmp al, [esi+ecx] ; scan for 1st byte of pattern
je Pre_Match ; test if it matches
inc ecx
jnz Scan_Loop ; <<<< exit point here

jmp No_Match


Regards,

hutch@movsd.com
Posted on 2002-07-28 03:27:58 by hutch--
i think you didnt understand me .....



Pre_Scan:
inc ecx ; start on next byte

Scan_Loop:
cmp al, [esi+ecx]
je Pre_Match
inc ecx
jnz Scan_Loop

jmp No_Match

Pre_Match:
lea ebx, [esi+ecx]
mov edx, pLen
Test_Match:
mov ah, [ebx+edx-1] cmp ah, [edi+edx-1]
jne Pre_Scan ;------> here you jmp to inc ecx .but what if ecx +esi was the last byte . so you inc ecx and you will get overflow

if you do it like that you should jmp to


cmp al, [esi+ecx]
je Pre_Match
Pre_Scan:
inc ecx
jnz Scan_Loop

jmp No_Match

but have a look in my new instring version.
Posted on 2002-07-28 05:48:10 by eko
The .386 was what left in the original instring.asm, when I cut and pasted the updated proc code.

Enjoy your work, P1
Posted on 2002-07-28 16:20:49 by Pone
hi hutch,

you really need to fix instring as both versions (masm32 and fix) are faulty, look at what eko meant:

Pre_Loop:
pop ecx ; restore ECX <- if ecx was 8 and ebx was 9 (it was last byte to search)
inc ecx ; start on next byte <- then ecx = 9

Loop_Start:
cmp al, ;not maching
je Pre_Sub
inc ecx ;here ecx=0A
cmp ecx, ebx ; now ecx > ebx and the loop never ends until the
;overflow in ecx OR the routine finds matching byte in the memory
;beyond string range, and even gives a wrong hit on substring itself
;trace with a debbuger to see it
jne Loop_Start

in the attached program the routine gives a wrong hit, worse when I stuffed it into a dll it gave access violations because of ecx overflow

P.S. sorry, I pulled the wrong proc out from the dll. still your updated one gives me the same wrong hit, just don't have a time to trace into it with a debbuger right now.
Posted on 2002-08-08 13:15:43 by ramzez
ramzez,

I've known about this, since Hutch posted this.
But I have been too busy to offer any help on it.

I pulled it into VS, to look it and noticed that the pointer over-run the end of the string until hitting the process memory limit and then GPF.

Enjoy your work, P1
Posted on 2002-08-08 20:43:47 by Pone
Sorry to be a bit slow but I have had a lot to do recently.

The problem with the algo was not the additional inc ecx but the exit condition. JNZ allowed a mismatch to run past the end of the buffer in some instances as it only tests zero, not zero or more. I modified the algo by using a test for sign so that the counter coming up to zero or higher would exit properly.

This is how I tested the results.


.data
buffer3 db "This is a test",0,"with trailing string data",0
buffer4 db "t",0
buffer5 db "string",0
.code

invoke InStringx,1,ADDR buffer3, ADDR buffer4
ShowReturn hWnd, eax

invoke InStringx,1,ADDR buffer3, ADDR buffer5
ShowReturn hWnd, eax

I have tested various combinations of matches, mismatches and tested for a match past the end of the first terminating zero and the results I am getting will only match before the first terminating zero and will mismatch any other combination.

The reason why I have persisted with this format is to keep the main scan loop down to 4 instructions as it is the most critical in a scanner for speed. The branch compare to test the full string match does not particularly effect the main algorithm speed. The branch compare loop uses an early out on mismatch to keep the branch time down and EKO's design of scanning the match backwards to reduce the number of instructions needed to set it up.

This is the code below, it has on;ly 1 instruction different, it changes a JNZ to a JS.


; #########################################################################

InStringx proc startpos:DWORD,lpSource:DWORD,lpPattern:DWORD

; ------------------------------------------------------------------
; InString searches for a substring in a larger string and if it is
; found, it returns its position in eax.
;
; It uses a one (1) based character index (1st character is 1,
; 2nd is 2 etc...) for both the "StartPos" parameter and the returned
; character position.
;
; Return Values.
; If the function succeeds, it returns the 1 based index of the start
; of the substring.
; 0 = no match found
; -1 = substring same length or longer than main string
; -2 = "StartPos" parameter out of range (less than 1 or longer than
; main string)
; ------------------------------------------------------------------

LOCAL sLen:DWORD
LOCAL pLen:DWORD

push ebx
push esi
push edi

invoke StrLen,lpSource
mov sLen, eax ; source length
invoke StrLen,lpPattern
mov pLen, eax ; pattern length

cmp startpos, 1
jge @F
mov eax, -2
jmp isOut ; exit if startpos not 1 or greater
@@:

dec startpos ; correct from 1 to 0 based index

cmp eax, sLen
jl @F
mov eax, -1
jmp isOut ; exit if pattern longer than source
@@:

sub sLen, eax ; don't read past string end
inc sLen

mov ecx, sLen
cmp ecx, startpos
jg @F
mov eax, -2
jmp isOut ; exit if startpos is past end
@@:

; ----------------
; setup loop code
; ----------------
mov esi, lpSource
mov edi, lpPattern
mov al, [edi] ; get 1st char in pattern

add esi, ecx ; add source length
neg ecx ; invert sign
add ecx, startpos ; add starting offset

jmp Scan_Loop

align 16

; @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Pre_Scan:
inc ecx ; start on next byte

Scan_Loop:
cmp al, [esi+ecx] ; scan for 1st byte of pattern
je Pre_Match ; test if it matches
inc ecx
js Scan_Loop ; exit on sign inversion

; cmp ecx, 0 ; works but 1 instruction longer
; jl Scan_Loop

jmp No_Match

Pre_Match:
lea ebx, [esi+ecx] ; put current scan address in EBX
mov edx, pLen ; put pattern length into EDX

Test_Match:
mov ah, [ebx+edx-1] ; load last byte of pattern length in main string
cmp ah, [edi+edx-1] ; compare it with last byte in pattern
jne Pre_Scan ; jump back on mismatch
dec edx
jnz Test_Match ; 0 = match, fall through on match

; @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Match:
add ecx, sLen
mov eax, ecx
inc eax
jmp isOut

No_Match:
xor eax, eax

isOut:
pop edi
pop esi
pop ebx

ret

InStringx endp

; ########################################################################

Regards,

hutch@movsd.com
Posted on 2002-08-08 23:30:26 by hutch--
Well, this fixed several of my apps, that had intermittant GPFs.

Enjoy your work, P1 :alright:

PS: I do MASM32 work as time permits.
Posted on 2002-08-09 09:30:58 by Pone
I tested it in the dll with an application that does navigation within an exported outlook folder text file or the like. THe app searches for the line "subject" and displays results in a listbox. I tested with ~2 mb files, there's of course multiple hits and everything seems to be fine, no wrong hits anymore.
Posted on 2002-08-15 11:00:07 by ramzez
Guys,

Thanks for verifying the procedure, the problem is always testing a procedure across a wide enough set of conditions to get it right.

Regards,

hutch@movsd.com
Posted on 2002-08-15 21:47:41 by hutch--