cmp eax,3Ah
sbb ecx,ecx ;should be - 1 else 0
cmp eax,30h
adc ecx,0
jns @notdigit
sbb ecx,ecx ;should be - 1 else 0
cmp eax,30h
adc ecx,0
jns @notdigit
I had a look to my own words and tought I would never figure out what it was about:)
It's how to check without jcc if char (in eax) <=39h("9") and
>=30h("0")
It's how to check without jcc if char (in eax) <=39h("9") and
>=30h("0")
how about
lea ecx,[-3Ah]
sub eax,30h
xor ecx,eax
Then you do a js/jns depending on weither your looking for something inside the range or not.
lea ecx,[-3Ah]
sub eax,30h
xor ecx,eax
Then you do a js/jns depending on weither your looking for something inside the range or not.
Very creative idea.
But it should be
lea ecx,[-39h]
in my code 3Ah is to set cf if eax is higher then 39h (biggest acceptable value of the range)
One thing more, value in eax should be unchanged.
But your method in "don't need the value any more" case will work faster than mine.
In case "I need it" we shall add eax,30h at the end.
(or better lea eax, to make it in parallel without spoiling flags)
At least even in the case it wouldn't be slower (though not faster)
Your algo also have a good chance in superscalar performence in
case it used it loops. My one has dependences.
But it should be
lea ecx,[-39h]
in my code 3Ah is to set cf if eax is higher then 39h (biggest acceptable value of the range)
One thing more, value in eax should be unchanged.
But your method in "don't need the value any more" case will work faster than mine.
In case "I need it" we shall add eax,30h at the end.
(or better lea eax, to make it in parallel without spoiling flags)
At least even in the case it wouldn't be slower (though not faster)
Your algo also have a good chance in superscalar performence in
case it used it loops. My one has dependences.
this may be a very dumb question but what is
[-3Ah] ?? isn't this the same as ?
for what is this notation good for, i never used this
before...
thank you...
[-3Ah] ?? isn't this the same as ?
for what is this notation good for, i never used this
before...
thank you...
Yes, it is.
We use it just to show that 3Ah is upper limit
We use it just to show that 3Ah is upper limit
So will be more understandable and faster?
cmp eax,'9' ; 1
seta dl ; 1
cmp eax,'0' ; 1
sbb dl,0 ; 2
jne @notdigit ; 1
vs
cmp eax,3Ah ; 1
sbb ecx,ecx ; 2
cmp eax,30h ; 1
adc ecx,0 ; 2
jns @notdigit ; 1
6 vs 7 mu-ops
but the next code is realy more faster
mov ecx,'0'-1 ; 1
sub ecx,eax ; 1 ecx<0 if eax>='0'
lea edx, ; 1 edx<0 if eax<='9'
and ecx,edx ; 1
js @@notdigit ; 1
cmp eax,'9' ; 1
seta dl ; 1
cmp eax,'0' ; 1
sbb dl,0 ; 2
jne @notdigit ; 1
vs
cmp eax,3Ah ; 1
sbb ecx,ecx ; 2
cmp eax,30h ; 1
adc ecx,0 ; 2
jns @notdigit ; 1
6 vs 7 mu-ops
but the next code is realy more faster
mov ecx,'0'-1 ; 1
sub ecx,eax ; 1 ecx<0 if eax>='0'
lea edx, ; 1 edx<0 if eax<='9'
and ecx,edx ; 1
js @@notdigit ; 1
Nexo,
Code made from Eyin's one will be faster 'cause it gives possibility
to parellel commands:
lea ecx,[-39h] ;1
sub eax,30h ;0
xor ecx,eax ;1
lea eax, ;0
js
2 clocks untill js even checking cc in parallel
Though microcode calc is right in your code there is dependeces
through all code line.
Code made from Eyin's one will be faster 'cause it gives possibility
to parellel commands:
lea ecx,[-39h] ;1
sub eax,30h ;0
xor ecx,eax ;1
lea eax, ;0
js
2 clocks untill js even checking cc in parallel
Though microcode calc is right in your code there is dependeces
through all code line.
The Svin,
Why my code will be more slowly?
1. If you have presented clock ticks for a Pentium you are mistaken.
lea ecx,[-39h] ;1
sub eax,30h ;1-1
xor ecx,eax ;1
lea eax, ;2-1 AGI penalty for prev op in V-pipe
js
When you bring clock ticks it is necessary to apply even bases of optimization.
2. If to observe the last processors, a situation another.
I have produced measurements of execution of the code with usage rdtsc on processor AthlonXP.
1)REPT 256
lea ecx,
sub eax,30h
xor ecx,eax
lea eax,
ENDM
ret
2)REPT 256
mov ecx,'0'-1
sub ecx,eax
lea edx,
and ecx,edx
ENDM
ret
1) 760 clocks 2) 384 clocks
Reason to this cyclic execution :)
It is an instance of incorrect measurement.
Really for the second code the greater potential on parallelism.
Why my code will be more slowly?
1. If you have presented clock ticks for a Pentium you are mistaken.
lea ecx,[-39h] ;1
sub eax,30h ;1-1
xor ecx,eax ;1
lea eax, ;2-1 AGI penalty for prev op in V-pipe
js
When you bring clock ticks it is necessary to apply even bases of optimization.
2. If to observe the last processors, a situation another.
I have produced measurements of execution of the code with usage rdtsc on processor AthlonXP.
1)REPT 256
lea ecx,
sub eax,30h
xor ecx,eax
lea eax,
ENDM
ret
2)REPT 256
mov ecx,'0'-1
sub ecx,eax
lea edx,
and ecx,edx
ENDM
ret
1) 760 clocks 2) 384 clocks
Reason to this cyclic execution :)
It is an instance of incorrect measurement.
Really for the second code the greater potential on parallelism.
js @@notdigit ; 1
change it to jns @@notdigit.
Let's write proc that validates ASCIIZ string for characters
that are not in "0" - "9" range.
Given: pointer to ASCIIZ string.
Out eax = 1 if all characters in "0"-"9" range
eax = 0 any characters utill zero end is out of the range.
Example
for string:
"123453456",0 eax = 1
for string:
"1234zz1234",0 eax = 0
And then we set our tests again.
It's good to have you here.
One more who works on carefull and original code solutions inventions.
Nexo,
I tested your code on PMMX, PIII
It was much faster then one derived from Eyin's one.
It may sound strange, for it's lucky day when one beats my one
algo :)
So thanks.
Stay with us
I tested your code on PMMX, PIII
It was much faster then one derived from Eyin's one.
It may sound strange, for it's lucky day when one beats my one
algo :)
So thanks.
Stay with us
Now tests (on both PMMX and PIII) in favour to my new
version:
lea ecx,[-39h]
lea edx,
xor ecx,edx
version:
lea ecx,[-39h]
lea edx,
xor ecx,edx
In any case these small slices of the code have no finished optimization. Optimization of similar pieces senselessly. It is visible on an instance of a cyclic code which I have shown. In concrete implementation other solution can turn out. All depends on real environment of these commands.
In the real life applications I act easier:
test, 1; mask for digit
In the real life applications I act easier:
test