Hi guys,
I need a little help (AGAIN :P).
is there fast method for forcing a byte to lower case regardless of wether it is already.
ie. a --> a
A---> a
what im trying to do is write a case insensitive string compare routine and the biggest bottleneck as far as i can see is the two comparisions (.if al<=Z && al>=A) .This also has to be performed on both sources!
Id appreciate any ptrs! :D
I need a little help (AGAIN :P).
is there fast method for forcing a byte to lower case regardless of wether it is already.
ie. a --> a
A---> a
what im trying to do is write a case insensitive string compare routine and the biggest bottleneck as far as i can see is the two comparisions (.if al<=Z && al>=A) .This also has to be performed on both sources!
Id appreciate any ptrs! :D
Unless you want to use an API call (like _stricmp), you've pretty much got it down (for ASCII at least).
.if al<=Z || al>=A
add al, 20h ;convert to ASCII lower-case equivalent...
.end if
thx spook.
To SpooK,
shouldn't that be
shouldn't that be
.if al >= 'A' && al <= 'Z'
add al, 20h ;"or al,20h" will also do :) (I think "or" faster than "add" ???)
.endif
To SpooK,
shouldn't that be
.if al >= 'A' && al <= 'Z'
add al, 20h ;"or al,20h" will also do :) (I think "or" faster than "add" ???)
.endif
Could be anything more efficient, just want to throw out a quick example.
If you care to examine the BINARY for an ascii character value, you will find that bit 5 determines upper/lower case.
a = 01100001
A = 01000001
What you want to do is mask out bit 5 before performing your comparison of the byte values.
If you are clever, you can mask and compare 4 bytes at a time.
a = 01100001
A = 01000001
What you want to do is mask out bit 5 before performing your comparison of the byte values.
If you are clever, you can mask and compare 4 bytes at a time.
Could be anything more efficient, just want to throw out a quick example.
Umm....just want to verify...the comparison you have done...
.if al<=Z || al>=A
Isn't this just wrong!!! (Assumed that when you say Z you want to say 'Z' (I'll give that) )
But a value like 20h (space) will get through this OR condition, wouldn't it? (20h less than 'Z' YES, other part of OR not necessary 1 OR 0 is 1 (TRUE) ...isn't it?)
or even a value which is ASCII 96 (one less than 'a') as the entire OR condition would succeed!!!
(96 NOT less than 'Z' but 96 greater than 'A' again ... 0 OR 1 is 1)
:shock: :shock: :shock:
Sorry, cut n' pasted asmrixstar's example and filled in the blank for him. Probably should have checked it first.
At any rate, listen to Homer, his answer was more thorough.
At any rate, listen to Homer, his answer was more thorough.
copied directly from my lbrary :)
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
StrCmpi proc pStr1:DWORD,pStr2:DWORD
push edi
push esi
or al,-1
mov edi,[8+esp+1*4];str1
mov esi,[8+esp+2*4];str2
@@:
test al,al
jz @F
mov al,
mov dl,
inc esi
inc edi
cmp dl,al
je @B
sub al,'A'
cmp al,'Z'-'A'+1
sbb cl,cl
and cl,'a'-'A'
add al,cl
add al,'A'
sub dl,'A'
cmp dl,'Z'-'A'+1
sbb cl,cl
and cl,'a'-'A'
add dl,cl
add dl,'A'
cmp dl,al
je @B
sbb al,al
sbb al,-1
@@:
movsx eax,al
pop esi
pop edi
ret 2*4
StrCmpi endp
OPTION PROLOGUE:PROLOGUEDEF
OPTION EPILOGUE:EPILOGUEDEF
If you consider speed, I will recommed you to take a look Boyer-Moore string searching algorithm
http://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm
http://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm
If you consider speed, I will recommed you to take a look Boyer-Moore string searching algorithm
It depends on string lengths whether BM will be faster, but if you have a lot of data to search through... sure. This thread was about string comparing though, not string searching.
wow big reponse ,
And the winner for the most helpful ....HOMER :P
yeah thats exacly what i was looking for thx
Thx to all... :)
And the winner for the most helpful ....HOMER :P
yeah thats exacly what i was looking for thx
Thx to all... :)
Just remember that Homer's method will only work for English text.
Actually it won't work. It will work only with (1) English (2) letters. Both of these 2 conditions must be met. Such method wouldn't find "f0dder" if you wrote "F0dder". So you must first check out wheter the sign in question is indeed a letter. Otherwise you would wipe out the spaces (0x20 and 0xDF = 0) and in result find nothing. English string comparisons can be neatly written using MMX (because the comparison instructions zero-out non-matching bytes). Without MMX you must use this "if al<=Z || al>=A" thing (and it still allows you to find English strings, only).
If you consider speed, I will recommed you to take a look Boyer-Moore
If he considers speed, then I'd rather recommend "turbo boyer-moore" (~2/3 of the original execution time), but that would require him to 'prepare' both strings as this algo is case-sensitive, wouldn't it?
If you consider speed, I will recommed you to take a look Boyer-Moore
If he considers speed, then I'd rather recommend "turbo boyer-moore" (~2/3 of the original execution time), but that would require him to 'prepare' both strings as this algo is case-sensitive, wouldn't it?
Umm....what _are_ you saying? I was pointing out the fact that the "OR condition" (||) is not correct and should be an "AND condition" (&&), thats all!!! :)
SpooK has even acknowledged the same...so what are you saying? :confused: :(
Regards,
Shantanu
SpooK has even acknowledged the same...so what are you saying? :confused: :(
Regards,
Shantanu
I was referring to the Homer's method of simply 'anding' the bytes and comparing them. Your method is correct, from what I see, and it's exactly what I was suggesting ^^'