cmp eax,3Ah
sbb ecx,ecx ;should be - 1 else 0
cmp eax,30h
adc ecx,0
jns @notdigit
Posted on 2002-02-13 08:57:08 by The Svin
I had a look to my own words and tought I would never figure out what it was about:)
It's how to check without jcc if char (in eax) <=39h("9") and
>=30h("0")
Posted on 2002-02-13 12:15:05 by The Svin
how about

lea ecx,[-3Ah]
sub eax,30h
xor ecx,eax

Then you do a js/jns depending on weither your looking for something inside the range or not.
Posted on 2002-02-14 02:33:33 by Eóin
Very creative idea.
But it should be
lea ecx,[-39h]
in my code 3Ah is to set cf if eax is higher then 39h (biggest acceptable value of the range)
One thing more, value in eax should be unchanged.
But your method in "don't need the value any more" case will work faster than mine.
In case "I need it" we shall add eax,30h at the end.
(or better lea eax, to make it in parallel without spoiling flags)
At least even in the case it wouldn't be slower (though not faster)
Your algo also have a good chance in superscalar performence in
case it used it loops. My one has dependences.
Posted on 2002-02-14 02:53:11 by The Svin
this may be a very dumb question but what is
[-3Ah] ?? isn't this the same as ?
for what is this notation good for, i never used this
before...

thank you...
Posted on 2002-02-14 04:22:01 by mob
Yes, it is.
We use it just to show that 3Ah is upper limit
Posted on 2002-02-14 04:32:18 by The Svin
So will be more understandable and faster?

cmp eax,'9' ; 1
seta dl ; 1
cmp eax,'0' ; 1
sbb dl,0 ; 2
jne @notdigit ; 1

vs

cmp eax,3Ah ; 1
sbb ecx,ecx ; 2
cmp eax,30h ; 1
adc ecx,0 ; 2
jns @notdigit ; 1

6 vs 7 mu-ops

but the next code is realy more faster

mov ecx,'0'-1 ; 1
sub ecx,eax ; 1 ecx<0 if eax>='0'
lea edx, ; 1 edx<0 if eax<='9'
and ecx,edx ; 1
js @@notdigit ; 1
Posted on 2002-02-15 15:32:32 by Nexo
Nexo,
Code made from Eyin's one will be faster 'cause it gives possibility
to parellel commands:
lea ecx,[-39h] ;1
sub eax,30h ;0
xor ecx,eax ;1
lea eax, ;0
js
2 clocks untill js even checking cc in parallel
Though microcode calc is right in your code there is dependeces
through all code line.
Posted on 2002-02-15 17:33:43 by The Svin
The Svin,
Why my code will be more slowly?
1. If you have presented clock ticks for a Pentium you are mistaken.
lea ecx,[-39h] ;1
sub eax,30h ;1-1
xor ecx,eax ;1
lea eax, ;2-1 AGI penalty for prev op in V-pipe
js
When you bring clock ticks it is necessary to apply even bases of optimization.
2. If to observe the last processors, a situation another.
I have produced measurements of execution of the code with usage rdtsc on processor AthlonXP.
1)REPT 256
lea ecx,
sub eax,30h
xor ecx,eax
lea eax,
ENDM
ret

2)REPT 256
mov ecx,'0'-1
sub ecx,eax
lea edx,
and ecx,edx
ENDM
ret

1) 760 clocks 2) 384 clocks
Reason to this cyclic execution :)
It is an instance of incorrect measurement.
Really for the second code the greater potential on parallelism.
Posted on 2002-02-16 04:27:29 by Nexo
js @@notdigit ; 1

change it to jns @@notdigit.

Let's write proc that validates ASCIIZ string for characters
that are not in "0" - "9" range.

Given: pointer to ASCIIZ string.
Out eax = 1 if all characters in "0"-"9" range
eax = 0 any characters utill zero end is out of the range.

Example
for string:
"123453456",0 eax = 1
for string:
"1234zz1234",0 eax = 0

And then we set our tests again.

It's good to have you here.
One more who works on carefull and original code solutions inventions.
Posted on 2002-02-16 08:36:31 by The Svin
Nexo,
I tested your code on PMMX, PIII
It was much faster then one derived from Eyin's one.

It may sound strange, for it's lucky day when one beats my one
algo :)
So thanks.
Stay with us
Posted on 2002-02-16 09:17:21 by The Svin
Now tests (on both PMMX and PIII) in favour to my new
version:
lea ecx,[-39h]
lea edx,
xor ecx,edx
Posted on 2002-02-16 12:55:26 by The Svin
In any case these small slices of the code have no finished optimization. Optimization of similar pieces senselessly. It is visible on an instance of a cyclic code which I have shown. In concrete implementation other solution can turn out. All depends on real environment of these commands.
In the real life applications I act easier:
test , 1; mask for digit
These the trivial standard code which uses in compilers for classification of characters. Under one table special characters, characters of the alphabet, digits, hexadecimal digits are defined. Also uses for conversion in lower and the upper case of characters. Thus features of national languages use. And most important it approximately in 2,5 times is faster for definition of digits. Also it is my selected method during several years for any processor. To be easier it is necessary.
Posted on 2002-02-16 15:00:24 by Nexo
I wouldn't call it absolutly senceless, but there is thruth in your
word of real enviroment.
It was a reason why I offered real task - to write validation proc.

As to tables and masks.
I work mostly in fields of large data processing and databases.
I really havilly on different kinds of tables in my work and without it
to do my job is almost impossible given needed speed requred.
So I generally agree with you.
Yet there are lots of cases when you need determine ranges and
table masks would be to big for it.
Imagine that each coming value requeres its own proc.
And there are groups of values
Values in a group are uninteruppty progressing but distance between
groups my be big enough.
For example:
1,2,3,4,5
101,102,104,105,106,107
290789,290790....
And there may be tens of such groups with big amounts of elements.

So you can not create uninteruptive table of jmps to appropriate proc (it would be
to big even for recent memory contenments)
then you can create tables only for each group and first check for what group it belongs
using following macro:


IfInRange macro reg,uplimit,lowlimit,lbl
lea ecx,[reg-1][-uplimit]
lea edx,[reg-lowlimit]
xor ecx,edx
js lbl
endm


then to work with appropriate table of jmp you can use it as:


mov eax, Value
IfInRange eax,4,1,group1
IfInRange eax,107,101,group2
IfInRange eax,290812290789,290789,group3
........

group1:
sub eax,1
call dword ptr tbl1[eax*4]
ret
group2: sub eax,101
call dword ptr tbl2[eax*4]
ret
group3:
........

group4: .......

It's slappy code - in reality it's done in more effective way, I'm giving it just as idea where
range determination can not be easily done with table.
Posted on 2002-02-16 16:21:18 by The Svin
The Svin, thats looks supiciously like my code here. Tut tut tut :grin:
Posted on 2002-02-16 16:54:11 by Eóin
Indeed :)
Posted on 2002-02-16 17:09:57 by The Svin
You show solution of other task. The initial register varies. Limits differs for limits of digit. Therefore earlier instanced solutions here are not acceptable. This number we shall eliminate 290812290789:)
1.
mov eax, 92
IfInRange eax, 5,1, group1
IfInRange eax, 16,10, group1
IfInRange eax, 27,20, group1
IfInRange eax, 38,30, group1
IfInRange eax, 49,40, group1
IfInRange eax, 60,50, group1
IfInRange eax, 71,70, group1
IfInRange eax, 82,80, group1
IfInRange eax, 93,90, group1
17 clocks
2.
FirstLimit macro reg, lowlim, uplim, lbl
reglim equ reg
lastlowlim=0
sub reglim, lowlim
cmp reglim, uplim-lowlim
jbe lbl
endm

NextLimit macro lowlim, uplim, lbl
sub reglim, lowlim-lastlowlim
cmp reglim, uplim-lowlim
jbe lbl
lastlowlim=lowlim
endm

FirstLimit eax, 1,5, group1
NextLimit 10,16, group1
NextLimit 20,27, group1
NextLimit 30,38, group1
NextLimit 40,49, group1
NextLimit 50,60, group1
NextLimit 70,71, group1
NextLimit 80,82, group1
NextLimit 90,93, group1
9 clock

All clock ticks for AthlonXP.

sub reg,lowlim
cmp reg,uplim-lowlim
jbe lbl

It is a standard sequence for definition of values in range.
Posted on 2002-02-17 06:45:43 by Nexo
Thanx for you code.
Well, Nexo, if all cases are known and it is fine with us to arrange
checking in increasing order than it esier just to check for low limits starting with range of biggest value.
cmp eax,lowlimitofhighestgroup
jae GrpoupHieghst
cmp eax,lowlimitofnexttoHighestGroup
jae ....
and so on
The problem is that we probably can aspect some group
as more probable and want to check the more probable first.
And probability most likely wouldn't be aligned by values of numbers.

BTW I'm working with Pentium family. I no way against Althon but
but in my reality thruth about speed is conected to Pentium optimization. I'm not an expert of Althon but AFAIK both xor and
lea have special issues for Althon.
I just saw examples when somebody showed that mov reg,0
was faster in Althon than xor reg,reg and also some
modification in lea including in mastabing + 0 (I have no idea was
it alignment or for anything else)

Stay with us.
It's interesting to discuss asm prog topics with you.
Good luck.
Posted on 2002-02-17 07:18:51 by The Svin
And at once you will receive growth of a total number of conditions twice. In my code the sequence of growth of values is not mandatory.
FirstLimit eax, 20,27, group1
NextLimit 1,5, group1
NextLimit 10,16, group1
This code should work with any sequence. Therefore on probability of rangs I do not see a problem of a sequence.
Commands:
mov reg, 0 - xor reg, reg
lea reg,
Are useful to any processors for alignment of the subsequent code.

MOV reg32, imm32; DirectPath 1 clock
XOR reg32, reg32; DirectPath 1clock
These are equivalents.

If there is an interest in the future I can launch the code on processors PentiumIII and AthlonXP at once. It only if I shall be at home.
Posted on 2002-02-17 08:29:52 by Nexo
And at once you will receive growth of a total number of conditions twice.

I don't undertand this line, if I arange checking in increasing order the number of conditions will
be the same, there will be less instruction and faster cheching phase.
The only need is to separate groups hanling in group cases lbls code and add one instruction in start of
each group. Difference of speed can be culculated then
n = number of groups
n*3 = instructions in your macro to check n groups
n*2 = instructions in my version to check n groups
n*3-n*2 - n = 0
-n 'cause I need to add one sub instruction for each group handle
So there is no difference of size. Sub to culculate replacement in proctable performed only when
group is found. Not in each cheking.

The code:


NextLimit 10,16, group1
NextLimit 20,27, group1
NextLimit 30,38, group1
NextLimit 40,49, group1
NextLimit 50,60, group1
NextLimit 70,71, group1
NextLimit 80,82, group1
NextLimit 90,93, group1
Wich has 3 intruction for each line, (3*8=24)

may be replaced with the same number of conditions using method in my previous post.
Wich has two instruction for each group there for is smaller and faster (given that value is in any of group)
cmp eax,90
jea group8
cmp eax,80
jea group7
cmp eax,70
jea group6
cmp eax,50
jea group5
cmp eax,40
jea group4
cmp eax,30
jea group3
cmp eax,20
jea group2
cmp eax,10
jea group1
notincase:

........

group1: sub eax,10
jmp dword ptr [eax*4][offset Gr1Tbl]
group2: sub eax,20
jmp dword ptr [eax*4][offset Gr2Tbl]
and so on.

As you see there is the same number of conditions

It would be more clear if supply in your code what is in group1, since all your code jumps
to the same lable, maybe I missing something.

PS. Test on Pentium III would be greate.
Thanks for you msg.
Posted on 2002-02-17 09:42:05 by The Svin