You've done it very well.
It's optimal from both math and asm point of view (well pared).
Now try to do it in FPU.
Assume that
.data?
x real8
y real8
subOfXYsq real8
BTW I hope you use debugger.
Actually nothing talks as clear as learning instructions and
testing pieces of code in favour of using a debugger.
And consequently of need to manage fast and easy loading written code into a debugger (I do it by one key stroke in my shell).
Posted on 2001-12-29 03:37:26 by The Svin
here is my try in fpu (like i said before i'm not good in this )




fld x ;st(0)=x

fadd y ;fst(0)=x+y
fld x ; st(1)=x+y , st(0)=y
fsub y ; st(1)=x-y
fmul
fstp subOfXYsq





bye
eko
Posted on 2001-12-29 10:56:12 by eko
eko:
Here is one simple task more. (5th grade)
eax = side of square1
ecx = side of squre2
We don't know which one is bigger.
Task:
Find positive (abs) difference of perimeters of these two squares
without branching.
Give solutions both for fpu and integer.
----------------------------------------------------------------------------------
Another task:
Find sum of sign-changing HEX figures in dword.
Sign-changing means if you have value in hex 1234AFBCh
you need to find
1-2+3-4+A-F+B-C
It can be represented by difference of sum of odd figures and even figures
(1+3+A+B)-(2+4+F+C)

Get Anger.hlp with instruction set, and using it try to optimize for optimal paring.

Good Luck!
We don't know tricks - we just invent them ;)
Posted on 2002-01-13 23:02:01 by The Svin
I thought about the task above and have posted my first tries.
Please don't look if you wish to solve yourself. ;)
Posted on 2002-01-15 12:58:23 by bitRAKE
I couldn't help myself :)
Good code.
1st is the same logic but one clock faster than mine.
I missed an obvious thing that - - 1 is the same as +1 :)
Problem of second code, dispite of good ideas, is dependences
it is 11 clocks timing:
mov edx,eax ;1
and eax,0F0F0F0Fh ;0

shr edx,4 ;1
and edx,0F0F0F0Fh ;1
or edx,10101010h ;1
sub edx,eax ;1
mov eax,edx ;1
shr edx,16 ;1
add eax,edx ;1
add al,ah ;1
and eax,0FFh ;1
sub eax,64 ;1

here is one (I have 5 different versions) possible solution
to make it independence ( 7 clocks):

mov ebx,eax
shr eax,4

and ebx,0f0f0f0fh
and eax,0f0f0f0fh

mov edx,ebx
mov ecx,eax

rol edx,16
rol ecx,16

add ebx,edx
add eax,ecx

add bl,bh
add al,ah

sub al,bl
Posted on 2002-01-31 18:24:52 by The Svin