hi!
here is the case insesitive string comparing using characters table. i'm not sure it works like lstrcmpi, but it gives ability to compare strings contained national characters, which can be spreaded in upper ascii part in non alphabetical order. although i do not familiar with languages other english you can easy rebuild table to your using fasm compiler. currently it compares english and ukrainian/russian/belorussian/bulgarian via "mixed" alphabet. when proper chars are compared result can be used for alphabetical sorting. any non-alphabetical character will be always greater or smaller then any alphabetical depending on factor, passed to function.
made between work, may contain errors :) regards!
here is the case insesitive string comparing using characters table. i'm not sure it works like lstrcmpi, but it gives ability to compare strings contained national characters, which can be spreaded in upper ascii part in non alphabetical order. although i do not familiar with languages other english you can easy rebuild table to your using fasm compiler. currently it compares english and ukrainian/russian/belorussian/bulgarian via "mixed" alphabet. when proper chars are compared result can be used for alphabetical sorting. any non-alphabetical character will be always greater or smaller then any alphabetical depending on factor, passed to function.
made between work, may contain errors :) regards!
Good afternoon, Shoo
How did i translate it into masm?
regards
How did i translate it into masm?
regards
Hi Shoo and dcskm4200.
Here is my updated version of table lookup algo I've posted in "Faster Strncmpi".
Deals with English\Russian\Ukrainian characters but its very easy to adapt it for other codepages.
Delphi code that generates table automatically included (as comment).
Results (P4-2400 i845G):
Best Regards...
Here is my updated version of table lookup algo I've posted in "Faster Strncmpi".
Deals with English\Russian\Ukrainian characters but its very easy to adapt it for other codepages.
Delphi code that generates table automatically included (as comment).
Results (P4-2400 i845G):
All cases match...
Average execution time: 451 cycles.
All cases dont match...
Average execution time: 720 cycles.
Best Regards...
Hello,Bohdan
the tested result
Hit any key to start ...
All cases match...
Average execution time: 383 cycles.
All cases dont match...
Average execution time: 548 cycles.
regards
the tested result
Hit any key to start ...
All cases match...
Average execution time: 383 cycles.
All cases dont match...
Average execution time: 548 cycles.
regards
Here is a little improvement to my algo.
1) Lookup table is automatically generated according to current codepage ( StrCmpiBuitdTable )
2) Better instructions pairability
Can by buggy, need's testing.
Regards...
1) Lookup table is automatically generated according to current codepage ( StrCmpiBuitdTable )
2) Better instructions pairability
Can by buggy, need's testing.
Regards...
hi dcskm4200
i'll do masm equiv for you, but while you should fed fasm into you armory ;)
regards!
i'll do masm equiv for you, but while you should fed fasm into you armory ;)
regards!
Hello,Shoo
Thanks for you'll do.
as a newbie, I think I just started up the long hard process. i'm tiring now. if i'll take fasm into my armory, my brain can't provide any energy for me.
best regards.
Thanks for you'll do.
as a newbie, I think I just started up the long hard process. i'm tiring now. if i'll take fasm into my armory, my brain can't provide any energy for me.
best regards.
hi dcskm4200
1.here is a working converted source - only as example, because i'll refuse of it and will use xlat (i'll start to code it when i will imagine it clear in the head)
2.imho converting sources from one compiler to another is more harder then just have many compilers and use appropriate one for the source
3.i'm surprising you are interesting with uppercasing cyrillic letters :) or you are just interesting with algo? also programming is just my hobby, so i can not say my code is always correct ;)
regards!
1.here is a working converted source - only as example, because i'll refuse of it and will use xlat (i'll start to code it when i will imagine it clear in the head)
2.imho converting sources from one compiler to another is more harder then just have many compilers and use appropriate one for the source
3.i'm surprising you are interesting with uppercasing cyrillic letters :) or you are just interesting with algo? also programming is just my hobby, so i can not say my code is always correct ;)
regards!
I guess we can all agree that code not supporting international characters isn't all so very useful.
A table-based approach should be pretty universal, at least as long as we are dealing with single-byte-per-character strings. This probably works with everything except japanese and chinese?
How about speed, though - for a straight-forward table we'd need 256 bytes for an uppercase table, and 256 bytes for a lowercase one (if we decide to use the tables for more than just case-insensitive compare). This must affect speed some because of cache, and of course because of memory dependency...
Another approach could be generating the routines on-the-fly... this would require detecting which range(s) that change between upper and lower case (or are all alphabets contiguous? I think not).
Unicode would be more universal, but it probably isn't very suitable to implement with tables...
A table-based approach should be pretty universal, at least as long as we are dealing with single-byte-per-character strings. This probably works with everything except japanese and chinese?
How about speed, though - for a straight-forward table we'd need 256 bytes for an uppercase table, and 256 bytes for a lowercase one (if we decide to use the tables for more than just case-insensitive compare). This must affect speed some because of cache, and of course because of memory dependency...
Another approach could be generating the routines on-the-fly... this would require detecting which range(s) that change between upper and lower case (or are all alphabets contiguous? I think not).
Unicode would be more universal, but it probably isn't very suitable to implement with tables...
Hey, Shoo
Thanks you very much.
it works fine.
by the way, I also appreciate your travesty. not only you are a coder, but also are an artist.
best regards
Thanks you very much.
it works fine.
by the way, I also appreciate your travesty. not only you are a coder, but also are an artist.
best regards
f0dder, there is no such thing as uppercase and lowercase in Chinese. I think the same thing applies to Japanese. ;)
Hello,roticv
although there isn't such thing as uppercase and lowercase in Chinese. but there is that thing as tradition and simplify in chinese. here is a software that do this.
regards
although there isn't such thing as uppercase and lowercase in Chinese. but there is that thing as tradition and simplify in chinese. here is a software that do this.
regards
f0dder, there is no such thing as uppercase and lowercase in Chinese. I think the same thing applies to Japanese. ;)
Ah, thinking of it, that makes sense - ideographs and such. Shows how much of a b**** it can be to support foreign languages correctly :)
i think for multybyte chars calculation is needed, or mixed method. btw in japanese there are katakana and hiragana which grafically differ but sound same.
what about tables, which are fitting into 256, some tables are needed, at least 3-4:
1.uppercasing
2.lowercasing
3.comparing with returning sort value
1&2 are clear, 3 - char code should be converted not into appropriate case, but into order value according to alphabet position of this char. why i said 3-4 - it is possible to have 2 tables where one will contain non-alphabetic characters before alphabetic, and another - vice versa (it is not good when alphabetic characters will be separated or embraced with non-alph). commonly, it is possible to have many such tables, just pass to function the pointer to needed table.
also, for speed, it is good to have a separate function for partial uppercasing: for example, you have template name in data and sure it is lowercase - why function have to calculate case for it each cycle? it will do it only for second (unknown) string.
here is a code not runned yet:
what about tables, which are fitting into 256, some tables are needed, at least 3-4:
1.uppercasing
2.lowercasing
3.comparing with returning sort value
1&2 are clear, 3 - char code should be converted not into appropriate case, but into order value according to alphabet position of this char. why i said 3-4 - it is possible to have 2 tables where one will contain non-alphabetic characters before alphabetic, and another - vice versa (it is not good when alphabetic characters will be separated or embraced with non-alph). commonly, it is possible to have many such tables, just pass to function the pointer to needed table.
also, for speed, it is good to have a separate function for partial uppercasing: for example, you have template name in data and sure it is lowercase - why function have to calculate case for it each cycle? it will do it only for second (unknown) string.
here is a code not runned yet:
proc tstrcmp a_str, b_str, x_tab ; asciiz
;-----------------------------------------------------------------------
push ebx
push esi
push edi
;-----------------------------------------------------------------------
mov ebx,
mov esi,
mov edi,
;-----------------------------------------------------------------------
.next:
lodsb
test al,al
jne @F
sub al,
jmp .finish
@@:
movzx edx,
xlatb
inc edi
sub al,
jz .next
.finish:
movsx eax,al
;-----------------------------------------------------------------------
pop edi
pop esi
pop ebx
;-----------------------------------------------------------------------
ret
;-----------------------------------------------------------------------
endp
Hi Shoo!
what about tables, which are fitting into 256, some tables are needed, at least 3-4:
1.uppercasing
2.lowercasing
3.comparing with returning sort value
Can anybody tell me, why we need separated tables 1 & 2 ???
what about tables, which are fitting into 256, some tables are needed, at least 3-4:
1.uppercasing
2.lowercasing
3.comparing with returning sort value
Can anybody tell me, why we need separated tables 1 & 2 ???
try to guess ;)
--------
in real, although topic is about comparing, i'm thinking about uppercasing/lowercasing also - to do it fast separate tables needed. also, if you have already uppercased word - you can call function which will uppercase only second word, same for lowercased word, and this can be done with one function by passing to it pointer to appropriate table.
(also i'm thinking about different string formats also: do you know a story about young and old buffalos and some cows beyound the little river ;) )
--------
in real, although topic is about comparing, i'm thinking about uppercasing/lowercasing also - to do it fast separate tables needed. also, if you have already uppercased word - you can call function which will uppercase only second word, same for lowercased word, and this can be done with one function by passing to it pointer to appropriate table.
(also i'm thinking about different string formats also: do you know a story about young and old buffalos and some cows beyound the little river ;) )