I've been searching the board for a case insensitive InString algorithm that will work with foreign characters (like ?,?,?,?...) but I couldn't find any. Have anyone seen an algo like this?
Delight
Delight
I have had to use a CharUpper/CharLower first and do a compare after for case matching, if required.
Regards, P1 :cool:
Regards, P1 :cool:
That sounds like a good idea, thanks :alright:
I found out a fast way to make a string upper/lowercase:
Then you can just do a mov al, to convert the char in eax to uppercase. This doesn't help me much though because I would have to change the whole search buffer to uppercase and that is not an option because I need it as it is and I can not make a copy of it because it might be several MB of text.
.data?
Alfa db 255 dup(?)
.code
xor ecx,ecx
@@:
inc ecx
mov byte ptr[Alfa+ecx],cl
cmp ecx,255
jnz @B
invoke CharUpperBuff,offset Alfa,sizeof Alfa
Then you can just do a mov al, to convert the char in eax to uppercase. This doesn't help me much though because I would have to change the whole search buffer to uppercase and that is not an option because I need it as it is and I can not make a copy of it because it might be several MB of text.
no, thats the solution you want.
use it like this (sample: non-casesensitive string-compare):
use it like this (sample: non-casesensitive string-compare):
;generate the alfa-buffer here
mov esi, string1
mov edi, string2
@@:
movzx eax,byte [esi]
mov al, [alfa+eax] ;generate temporary uppercase
movzx ebx,byte [edi]
mov bl, [alfa+ebx] ;also here (for second string)
cmp al,bl ;compare them
jne not_equal
test al,al
jz equal
jmp @b
The problem is that I can't think of any place to store the uppercased char because all the registers are already being used in the inner loop :( Here's the InString code for those of you who haven't already memorized it:
An option is to use the stack and push/pop some register but that would probably make it terribly slow. Any ideas?
InString proc startpos:DWORD,lpSource:DWORD,lpPattern:DWORD
; ------------------------------------------------------------------
; InString searches for a substring in a larger string and if it is
; found, it returns its position in eax.
;
; It uses a one (1) based character index (1st character is 1,
; 2nd is 2 etc...) for both the "StartPos" parameter and the returned
; character position.
;
; Return Values.
; If the function succeeds, it returns the 1 based index of the start
; of the substring.
; 0 = no match found
; -1 = substring same length or longer than main string
; -2 = "StartPos" parameter out of range (less than 1 or longer than
; main string)
; ------------------------------------------------------------------
LOCAL sLen:DWORD
LOCAL pLen:DWORD
push ebx
push esi
push edi
invoke StrLen,lpSource
mov sLen, eax ; source length
invoke StrLen,lpPattern
mov pLen, eax ; pattern length
cmp startpos, 1
jge @F
mov eax, -2
jmp isOut ; exit if startpos not 1 or greater
@@:
dec startpos ; correct from 1 to 0 based index
cmp eax, sLen
jl @F
mov eax, -1
jmp isOut ; exit if pattern longer than source
@@:
sub sLen, eax ; don't read past string end
inc sLen
mov ecx, sLen
cmp ecx, startpos
jg @F
mov eax, -2
jmp isOut ; exit if startpos is past end
@@:
; ----------------
; setup loop code
; ----------------
mov esi, lpSource
mov edi, lpPattern
mov al, [edi] ; get 1st char in pattern
add esi, ecx ; add source length
neg ecx ; invert sign
add ecx, startpos ; add starting offset
jmp Scan_Loop
align 16
; @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[B]
Pre_Scan:
inc ecx ; start on next byte
Scan_Loop:
cmp al, [esi+ecx] ; scan for 1st byte of pattern
je Pre_Match ; test if it matches
inc ecx
js Scan_Loop ; exit on sign inversion
jmp No_Match
Pre_Match:
lea ebx, [esi+ecx] ; put current scan address in EBX
mov edx, pLen ; put pattern length into EDX
Test_Match:
mov ah, [ebx+edx-1] ; load last byte of pattern length in main string
cmp ah, [edi+edx-1] ; compare it with last byte in pattern
jne Pre_Scan ; jump back on mismatch
dec edx
jnz Test_Match ; 0 = match, fall through on match
[/B]
; @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Match:
add ecx, sLen
mov eax, ecx
inc eax
jmp isOut
No_Match:
xor eax, eax
isOut:
pop edi
pop esi
pop ebx
ret
InString endp
; ########################################################################
end
An option is to use the stack and push/pop some register but that would probably make it terribly slow. Any ideas?
"The problem is that I can't think of any place to store the uppercased char because all the registers are already being used in the inner loop "
Hello
Er... hoping u don't code for 486 or early Pentium :grin: yet... what about using MMX registers ? Personally I cannot live without.
Regards
Hello
Er... hoping u don't code for 486 or early Pentium :grin: yet... what about using MMX registers ? Personally I cannot live without.
Regards
Hi!
I didn't think of that, thanks :) I'll see what I can do and the post the code here
I didn't think of that, thanks :) I'll see what I can do and the post the code here
Delight,
What you look for with a search algorithm of the type you are after is a table based algorithm that you pass your own table to so that you can match any character set you like.
It basically means instead of assuming the ascii character set as the InString algo does, you reference characters in a table you provide yourself.
What you look for with a search algorithm of the type you are after is a table based algorithm that you pass your own table to so that you can match any character set you like.
It basically means instead of assuming the ascii character set as the InString algo does, you reference characters in a table you provide yourself.