hi!
here is the case insesitive string comparing using characters table. i'm not sure it works like lstrcmpi, but it gives ability to compare strings contained national characters, which can be spreaded in upper ascii part in non alphabetical order. although i do not familiar with languages other english you can easy rebuild table to your using fasm compiler. currently it compares english and ukrainian/russian/belorussian/bulgarian via "mixed" alphabet. when proper chars are compared result can be used for alphabetical sorting. any non-alphabetical character will be always greater or smaller then any alphabetical depending on factor, passed to function.

made between work, may contain errors :) regards!
Attachments:
Posted on 2005-09-02 06:53:12 by Shoo
Good afternoon, Shoo

How did i translate it into masm?

regards
Attachments:
Posted on 2005-09-02 09:47:11 by dcskm4200
Hi Shoo and dcskm4200.

Here is my updated version of table lookup algo I've posted in "Faster Strncmpi".
Deals with English\Russian\Ukrainian characters but its very easy to adapt it for other codepages.
Delphi code that generates table automatically included (as comment).

Results (P4-2400 i845G):

All cases match...
Average execution time: 451 cycles.

All cases dont match...
Average execution time: 720 cycles.


Best Regards...
Attachments:
Posted on 2005-09-02 10:46:20 by Bohdan
Hello,Bohdan

the tested result

Hit any key to start ...
All cases match...
Average execution time: 383 cycles.

All cases dont match...
Average execution time: 548 cycles.


regards
Posted on 2005-09-02 11:02:33 by dcskm4200
Here is a little improvement to my algo.

1) Lookup table is automatically generated according to current codepage ( StrCmpiBuitdTable )
2) Better instructions pairability

Can by buggy, need's testing.
Regards...
Attachments:
Posted on 2005-09-02 14:34:22 by Bohdan
hi dcskm4200
i'll do masm equiv for you, but while you should fed fasm into you armory ;)
regards!
Posted on 2005-09-03 06:08:49 by Shoo
Hello,Shoo

Thanks for you'll do.

as a newbie, I think I just started up the long hard process. i'm tiring now. if i'll take fasm into my armory, my brain can't provide any energy for me. 

best regards.
Posted on 2005-09-03 06:45:05 by dcskm4200
hi dcskm4200

1.here is a working converted source - only as example, because i'll refuse of it and will use xlat (i'll start to code it when i will imagine it clear in the head)
2.imho converting sources from one compiler to another is more harder then just have many compilers and use appropriate one for the source
3.i'm surprising you are interesting with uppercasing cyrillic letters :) or you are just interesting with algo? also programming is just my hobby, so i can not say my code is always correct ;)

regards!
Attachments:
Posted on 2005-09-05 00:40:49 by Shoo
I guess we can all agree that code not supporting international characters isn't all so very useful.

A table-based approach should be pretty universal, at least as long as we are dealing with single-byte-per-character strings. This probably works with everything except japanese and chinese?

How about speed, though - for a straight-forward table we'd need 256 bytes for an uppercase table, and 256 bytes for a lowercase one (if we decide to use the tables for more than just case-insensitive compare). This must affect speed some because of cache, and of course because of memory dependency...

Another approach could be generating the routines on-the-fly... this would require detecting which range(s) that change between upper and lower case (or are all alphabets contiguous? I think not).

Unicode would be more universal, but it probably isn't very suitable to implement with tables...
Posted on 2005-09-05 02:02:39 by f0dder
Hey, Shoo

Thanks you very much.

it works fine.
by the way, I also appreciate your travesty. not only you are a coder, but also are an artist.

best regards
Posted on 2005-09-05 02:06:29 by dcskm4200
f0dder, there is no such thing as uppercase and lowercase in Chinese. I think the same thing applies to Japanese.  ;)
Posted on 2005-09-05 02:09:53 by roticv
Hello,roticv

although there isn't such thing as uppercase and lowercase in Chinese. but there is that thing as tradition and simplify in chinese. here is a software that do this.

regards
Attachments:
Posted on 2005-09-05 03:30:00 by dcskm4200

f0dder, there is no such thing as uppercase and lowercase in Chinese. I think the same thing applies to Japanese.  ;)


Ah, thinking of it, that makes sense - ideographs and such. Shows how much of a b**** it can be to support foreign languages correctly :)
Posted on 2005-09-05 04:20:10 by f0dder
i think for multybyte chars calculation is needed, or mixed method. btw in japanese there are katakana and hiragana which grafically differ but sound same.
what about tables, which are fitting into 256, some tables are needed, at least 3-4:
1.uppercasing
2.lowercasing
3.comparing with returning sort value
1&2 are clear, 3 - char code should be converted not into appropriate case, but into order value according to alphabet position of this char. why i said 3-4 - it is possible to have 2 tables where one will contain non-alphabetic characters before alphabetic, and another - vice versa (it is not good when alphabetic characters will be separated or embraced with non-alph). commonly, it is possible to have many such tables, just pass to function the pointer to needed table.

also, for speed, it is good to have a separate function for partial uppercasing: for example, you have template name in data and sure it is lowercase - why function have to calculate case for it each cycle? it will do it only for second (unknown) string.

here is a code not runned yet:
proc tstrcmp a_str, b_str, x_tab ; asciiz
;-----------------------------------------------------------------------
    push ebx
    push esi
    push edi
;-----------------------------------------------------------------------
    mov ebx,
    mov esi,
    mov edi,
;-----------------------------------------------------------------------
.next:   
    lodsb
    test al,al
    jne @F
    sub al,
    jmp .finish
@@:
    movzx edx,
    xlatb
    inc edi
    sub al,
    jz  .next
.finish:
    movsx eax,al
;-----------------------------------------------------------------------
    pop edi
    pop esi
    pop ebx   
;-----------------------------------------------------------------------
    ret
;-----------------------------------------------------------------------
endp
Posted on 2005-09-05 05:07:27 by Shoo
Hi Shoo!


what about tables, which are fitting into 256, some tables are needed, at least 3-4:
1.uppercasing
2.lowercasing
3.comparing with returning sort value


Can anybody tell me, why we need separated tables 1 & 2 ???
Posted on 2005-09-05 05:43:54 by Bohdan
try to guess ;)
--------
in real, although topic is about comparing, i'm thinking about uppercasing/lowercasing also - to do it fast separate tables needed. also, if you have already uppercased word - you can call function which will uppercase only second word, same for lowercased word, and this can be done with one function by passing to it pointer to appropriate table.
(also i'm thinking about different string formats also: do you know a story about young and old buffalos and some cows beyound the little river ;) )
Posted on 2005-09-05 05:57:02 by Shoo