well to start another topic about string comparing..
the strncmpi function compares with case insencitive of a specified length, theres no standard strncmpi, but theres a strcmpi and a strncmp in standard c++ libs... so i coded a strncmpi in c++, and inline asm.
as i figured, my asm one won out. but not by too much. im at an impass, and i cant think of any thing more to improve my functions speed.
the more chars that cases dont match increase the time it takes to complete.
take a look at the attachment. i also go by cncr04s if that matters at all ;)


cncr04s ASM | 145 clocks | result 0   <- all cases match    85 chars
cncr04s ASM | 761 clocks | result 0   <- all cases dont match    85 chars
cncr04s C++ | 2645 clocks | result 0   <- compiler on 'release'    85 chars
Attachments:
Posted on 2005-08-28 20:50:07 by Qages
1.what about national characters?
2.if only a..z used and length of string is known - why not to compare full dwords:
...
mov eax,
(or lodsd - i always use it, but someone say it slow - do not know)
mov edx,
or eax,20202020h
or edx,20202020h
cmp eax,edx
...
if the end do not fit in dword - shr or and with eg. 0FF000000h
Posted on 2005-08-30 09:15:19 by Shoo
it does compare full dwords, and lodsd is hellisly slow.
it doesnt use only a-z. if it is a-z it makes it uppercase and makes the other uppercase if its a-z and compares, if they match its ok if not then its not same char.

i know nothing of national chars case,
Posted on 2005-08-30 12:02:26 by Qages
Hi
Here is the proc used in the new release of the ObjMem32 library of ObjAsm32.

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align @WordSize
StrLIComp proc pString1:Pointer, pString2:Pointer, dMaxLen:dword
    push ebx                    ;Save ebx
    mov ecx,           ;Load first pointer
    xor eax, eax
    mov edx,         ;Load second pointer
    mov ebx,
    inc ebx
align @WordSize
@@Char:
    dec ebx
    jnz @@Next
    xor eax, eax                ;eax = 0 (same string)
    pop ebx                    ;Restore ebx
    ret 12                      ;Return
align @WordSize
@@Next:
    mov al,               ;Load char to compare
    or al, al                  ;Test if end of string
    jz @@Eq?                    ;Compute result
    cmp al,               ;Compare with the char of the other string
    jnz @@ICmp                  ;Chars are not equal, check if for capitals
    inc ecx                    ;Goto for next char
    inc edx
    jmp @@Char
align @WordSize
@@ICmp:
    cmp al, 'z'                ;Check range 'A'..'Z' or 'a'..'z'
    ja @@Eq?
    cmp al, 'A'
    jb @@Eq?
    cmp al, 'a'
    jae @@1
    cmp al, 'Z'
    ja @@Eq?
@@1:
    xor al, 20h                ;Swap lowercase - uppercase
    cmp al,               ;Compare again
    jne @@2                    ;If not equal, compute result
    inc ecx                    ;Goto for next char
    inc edx
    jmp @@Char
@@2:
    and al, not 20h            ;Make uppercase
@@Eq?:
    xor ecx, ecx
    mov cl,               ;Get char
    cmp cl, 'a'                ;Check range 'a'..'z'
    jb @@Exit
    cmp cl, 'z'
    ja @@Exit
    and cl, not 20h            ;Make uppercase
@@Exit:
    sub eax, ecx                ;Subtract to see which is smaller
    pop ebx
    ret 12
StrLIComp endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


Regards,

Biterider
Posted on 2005-08-30 15:22:41 by Biterider
2.i was wrong with dwords :) i understand this later - dword can contain also alphabet characters and non-alphabet: i forgot that i've been using this method to search known words. with this a story:
Praporshik (highest non-officer range in our army - hero of many stories) provides a lesson for young soldiers. He says strickly:
- Write down, comrades soldiers: "Water starts boil when 90 degree"
One soldier hesitately rise his hand.
- What do you want, comrade soldier?
Soldier says:
- Of course, I'm not sure, but it seems to me, in the school we were study the water starts boil when another temperature...
Praporshik silently searches a while something in his old thik notebook, then says:
- Yes, comrades soldiers, correct: "Water starts boil when 100 degree". I was just mixed it with rect angle.


1.without national characters support this function can became unusable. characters with value over 127 can be compared using passed cross-table. why it is important: here is my work folder (other have same) - how you can search files there with it?

regards!
Attachments:
Posted on 2005-08-31 00:04:22 by Shoo
first, it will compare all chars, but if there a-z it will make them uppercase, so if u speek russian, youll just have to remember the cases must match.
new results with code posed by bitrider..
as you can see my code still is faster!

cncr04s ASM | 145 clocks | result 0 <-- all cases match
cncr04s C++ | 2762 clocks | result 0 <-- all cases match
OBJASM ASM | 573 clocks | result 0 <-- all cases match

cncr04s ASM | 779 clocks | result 0 <-- all cases dont match
cncr04s C++ | 2776 clocks | result 0 <-- all cases dont match
OBJASM ASM | 1007 clocks | result 0 <-- all cases dont match

Posted on 2005-08-31 11:48:02 by Qages
1.
so if u ..., youll just have to remember
- absolutely inacceptable ;) - if you think so you should not compare it with lstrcmpi - they are not equivalent (lstrcmpi compares all alphabetical characters case insensitive depending on current code page - and we need exactly this) - such function can be usefull only in specific cases, where only standart latin letters are used (eg. compilers etc.)

2.to Biterider:
how about this (did not tested - just thought and typed :)):

@@ICmp:
    xor ecx, ecx
    mov cl,                ;Get char
    cmp al, 'z'                 ;Check range 'A'..'Z' or 'a'..'z'
    ja @@Eq?
    cmp al, 'A'
    jb @@Eq?
    cmp al, 'a'
    jae @@1
    cmp al, 'Z'
    ja @@Eq?
@@1:
    xor al, 20h                 ;Swap lowercase - uppercase
    cmp al,                ;Compare again
    jne @@2                     ;If not equal, compute result
    inc ecx                     ;Goto for next char
    inc edx
    jmp @@Char
@@2:
    and al, not 20h             ;Make uppercase
    and cl, not 20h             ;Make uppercase - if already - not affected
@@Eq?:


regards!
---
oh, i see i did not all changes, but have no time currently, but idea is this: get another character and "and" it depending on result of first comparison to avoid two extra comparisons/conditional jumps

r!
Posted on 2005-09-01 00:07:29 by Shoo
Hi Qages
I take a quick look into your code and found 3 week points:

  • Your code is not capable to handle the case when the length parameter is bigger than the comparing string lengths.

  • Since you are reading in words step without alignment, it possible that you read out of a page boundary causing a GPF.

  • Since the reading is not limited to the real string lengths, it is possible that the reading can go beyond a page limit causing a GPF.



Regards,

Biterider
Posted on 2005-09-01 02:19:18 by Biterider
Hi Shoo
The problem is that you must check the char in cl to be in the range "a" .. "z" before you capitalize it. All other chars should not be changed!

Regards

Biterider
Posted on 2005-09-01 02:27:32 by Biterider
i know. if the 1st char is in 'A..Z' or 'a..z' range and you will reset 20h bit and will reset same bit in cl they will be equal only if cl is also in that range. of course, this can bring wrong sorting when not-alphabetic chars are used in the words (between 'Z' and 'a') - it need to see on it, but it will not affect comparison anyway, i think: needs to try, mby later :)

regards!

;++++++++++++++++++++++++
Hi Biterider

he-he, i made a better look and understand that function's 'tale' is to make proper comparison of symbols (first time it was seemed to me you get both characters each cycle), so, everything is right there. in other side i think, although such procedure is universal (you can use it just for compare and also for sorting depending on its result) i think there is a sence to have separate functions for this two tasks when speed more needed then size (programmer, when call it, always knows what he goes to do). also, i do not see the reason to compare non-alphabetic characters with alphabetic - it seems to me better in this case alphabetic should be always less (or always greater) then non-alphabetic. i'd played yestarday with table-based comparison, but both two variants were refused even not finished :) i'll continue it later...
Posted on 2005-09-01 02:41:48 by Shoo
Hi everybody!

Yep, table lookup is not very fast. :sad:
I wrote this code a few month ago.
Comparsion of two strings, (85 chars each) takes about 1000 clock's on P4-2400 i845G.
Works fine with English/Russian/Ukrainian chars. :D
Attachments:
Posted on 2005-09-02 09:06:27 by Bohdan
I also tried the table lookup algo to check if it is faster, but the result was negative. The above routine showed to be faster.

Biterider
Posted on 2005-09-02 09:22:25 by Biterider

Some improvements to my code.  :) :) :)


All cases match...
Average execution time: 451 cycles.

All cases dont match...
Average execution time: 720 cycles.

Attachments:
Posted on 2005-09-02 10:49:55 by Bohdan
Hi Bohdan
Have you noticed that you can out of a page boundary if you don't align the dword readings?

Regards

Biterider
Posted on 2005-09-02 15:18:39 by Biterider
Hi Biterider !

here is a part of improoved your procedure ;)
align @WordSize
@@ICmp:
    cmp al, 0C0h
    jb @@Eq?
@@1:
    xor al, 20h                ;Swap lowercase - uppercase
    cmp al,               ;Compare again
    jne @@2                    ;If not equal, compute result
    inc ecx                    ;Goto for next char
    inc edx
    jmp @@Char
@@2:
    and al, not 20h            ;Make uppercase
@@Eq?:
    xor ecx, ecx
    mov cl,               ;Get char
    cmp cl, 0E0h                ;Check range '?'..'??'
    jb @@Exit
    sub cl, not 20h            ;Make uppercase
@@Exit:


although it compares only russian letters instead of latin it will work even faster!

of course, it is a joke :) but characters, "additional" for some of you, are the "main" for us, and we need a real procedure, not just exersize.

Hi Bohdan!

just downloaded everything (from both topics) - will look later. did you used xlat? i've been thinking about it, maybe it should faster and better and not too bloating table.

regards!
Posted on 2005-09-02 22:40:02 by Shoo
Hi Shoo!

Yes, i use xlatb. Lookup table is tricky a bit, uppercase ASCII codes are switched with lowercase and vice versa.
With this table only one xlatb instruction is needed.
All comparsion looks like this:

                        cmp    al, bl
                        je      SameChars
                        xlatb
                        je      SameChars
NoMatch:          ...............
Posted on 2005-09-03 03:39:45 by Bohdan
Hi Bohdan!
yes, i already made quick look and currently thinking about this. i think this discussion should be continued in "table" topic :) later - you know: weekend - a lot of work at home ;)

i thought it is just need to show to those who never used national characters why calculation is not effective at all. on the picture you can see uppercased combined alphabet i've using to uppercase/sort factor calculation. below you can see how some of those characters chaotically spreaded through 128 - 255.

what about functions presented here they are also needed to fast comparation of urls, emails, programming languages where only standart latin can be used, in other words, just mark it as "... function for standart latin alphabet only"

regards!
Attachments:
Posted on 2005-09-03 06:06:55 by Shoo