I've been searching the board for a case insensitive InString algorithm that will work with foreign characters (like ?,?,?,?...) but I couldn't find any. Have anyone seen an algo like this?

Posted on 2003-08-03 15:06:07 by Delight
I have had to use a CharUpper/CharLower first and do a compare after for case matching, if required.

Regards, P1 :cool:
Posted on 2003-08-04 10:43:56 by Pone
That sounds like a good idea, thanks :alright:
Posted on 2003-08-06 07:25:59 by Delight
I found out a fast way to make a string upper/lowercase:

Alfa db 255 dup(?)

xor ecx,ecx
inc ecx
mov byte ptr[Alfa+ecx],cl
cmp ecx,255
jnz @B

invoke CharUpperBuff,offset Alfa,sizeof Alfa

Then you can just do a mov al, to convert the char in eax to uppercase. This doesn't help me much though because I would have to change the whole search buffer to uppercase and that is not an option because I need it as it is and I can not make a copy of it because it might be several MB of text.
Posted on 2003-08-06 12:41:14 by Delight
no, thats the solution you want.

use it like this (sample: non-casesensitive string-compare):

;generate the alfa-buffer here

mov esi, string1
mov edi, string2
movzx eax,byte [esi]
mov al, [alfa+eax] ;generate temporary uppercase
movzx ebx,byte [edi]
mov bl, [alfa+ebx] ;also here (for second string)
cmp al,bl ;compare them
jne not_equal
test al,al
jz equal
jmp @b
Posted on 2003-08-06 13:38:15 by hartyl
The problem is that I can't think of any place to store the uppercased char because all the registers are already being used in the inner loop :( Here's the InString code for those of you who haven't already memorized it:

InString proc startpos:DWORD,lpSource:DWORD,lpPattern:DWORD

; ------------------------------------------------------------------
; InString searches for a substring in a larger string and if it is
; found, it returns its position in eax.
; It uses a one (1) based character index (1st character is 1,
; 2nd is 2 etc...) for both the "StartPos" parameter and the returned
; character position.
; Return Values.
; If the function succeeds, it returns the 1 based index of the start
; of the substring.
; 0 = no match found
; -1 = substring same length or longer than main string
; -2 = "StartPos" parameter out of range (less than 1 or longer than
; main string)
; ------------------------------------------------------------------


push ebx
push esi
push edi

invoke StrLen,lpSource
mov sLen, eax ; source length
invoke StrLen,lpPattern
mov pLen, eax ; pattern length

cmp startpos, 1
jge @F
mov eax, -2
jmp isOut ; exit if startpos not 1 or greater

dec startpos ; correct from 1 to 0 based index

cmp eax, sLen
jl @F
mov eax, -1
jmp isOut ; exit if pattern longer than source

sub sLen, eax ; don't read past string end
inc sLen

mov ecx, sLen
cmp ecx, startpos
jg @F
mov eax, -2
jmp isOut ; exit if startpos is past end

; ----------------
; setup loop code
; ----------------
mov esi, lpSource
mov edi, lpPattern
mov al, [edi] ; get 1st char in pattern

add esi, ecx ; add source length
neg ecx ; invert sign
add ecx, startpos ; add starting offset

jmp Scan_Loop

align 16

; @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
inc ecx ; start on next byte

cmp al, [esi+ecx] ; scan for 1st byte of pattern
je Pre_Match ; test if it matches
inc ecx
js Scan_Loop ; exit on sign inversion

jmp No_Match

lea ebx, [esi+ecx] ; put current scan address in EBX
mov edx, pLen ; put pattern length into EDX

mov ah, [ebx+edx-1] ; load last byte of pattern length in main string
cmp ah, [edi+edx-1] ; compare it with last byte in pattern
jne Pre_Scan ; jump back on mismatch
dec edx
jnz Test_Match ; 0 = match, fall through on match
; @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

add ecx, sLen
mov eax, ecx
inc eax
jmp isOut

xor eax, eax

pop edi
pop esi
pop ebx


InString endp

; ########################################################################


An option is to use the stack and push/pop some register but that would probably make it terribly slow. Any ideas?
Posted on 2003-08-06 14:15:24 by Delight
"The problem is that I can't think of any place to store the uppercased char because all the registers are already being used in the inner loop "


Er... hoping u don't code for 486 or early Pentium :grin: yet... what about using MMX registers ? Personally I cannot live without.

Posted on 2003-08-07 05:28:59 by valy

I didn't think of that, thanks :) I'll see what I can do and the post the code here
Posted on 2003-08-07 10:16:29 by Delight

What you look for with a search algorithm of the type you are after is a table based algorithm that you pass your own table to so that you can match any character set you like.

It basically means instead of assuming the ascii character set as the InString algo does, you reference characters in a table you provide yourself.
Posted on 2003-08-22 21:26:24 by hutch--