Hi all!

I'm doing a fractal plotting program in Delphi that draws a mandelbrot set on the screen. I'm now trying to rewrite the most important parts to assembly and until now i'm doing just fine. I came to a part where I wanted to convert a screen coordinate to a complex coordiate. I used to do it using this piece of code (in pascal):

``````
s:=x/(imagewidth/(abs(xmin)+abs(xmax)))+xmin;
``````

I transtlated it into:

``````

fld X		// st=x
fld imagewidth	// st=imagewidth,st(1)=x
fld Xmin	// st=Xmin,st(1)=imagewidth,st(2)=x
fabs		// st=abs(Xmin),st(1)=imagewidth,st(2)=x
fld Xmax	// st=Xmax,st(1)=abs(Xmin),st(2)=imagewidth,st(3)=x
fabs		// st=abs(Xmax),st(1)=abs(Xmin),st(2)=imagewidth,st(3)=x
fdiv		// st=imagewidth/(abs(Xmax)+abs(Xmin)),st(1)=x
fdiv		// st=x/(imagewidth/(abs(Xmax)+abs(Xmin)))
fld Xmin	// st=Xmin,st(1)=x/(imagewidth/(abs(Xmax)+abs(Xmin)))
fstp s		// s:=x/(imagewidth/(abs(Xmin)+abs(Xmax)))+Xmin;

``````

This works, but it's much slower than Delphi's output!

How can I optimize this???

---EDIT-----
doh!
--------------

/Delight
Posted on 2002-03-05 07:13:29 by Delight
x / ( y / z) = (x * z) / y

Faster

``````
fld X
fmul imagewidth
fld Xmin
fabs
fld Xmax
fabs
fdiv
fstp s
``````

Mirno
Posted on 2002-03-05 08:14:33 by Mirno
Thank you Mirno! One step closer to perfection...:grin:

/Delight
Posted on 2002-03-05 08:41:41 by Delight
You should also try to avoid loading a value twice from memory as you do here with Xmin. Instead load it at the start onto the stack then reuse it.

``````fld Xmin
fld X
fmul imageWidth
fld st(1) ; Xmin
fabs
fld Xmax
fabs
fdiv
fstp s``````

Also, if you don't want to preserve Xmax then you could get its absolute value by ANDing the sign bit with 0 in memory then simply adding it.
Posted on 2002-03-05 09:21:51 by Eóin
I think that the fastest way to clear a real8 number is to do:

var db ?
...
xor var, var

Marilyn
Posted on 2002-03-05 09:22:29 by Marilyn
You can't xor a memory variable with a memory variable.
Posted on 2002-03-05 09:42:40 by Qweerdy
A real8 is eight bytes - requires MMX/FPU to store eight bytes in one instruction, but MMX/FPU would require an instruction to load a zero to store. I think this would be fastest/shortest:
``````and DWORD PTR [var],0
and DWORD PTR [var + 4],0``````
Posted on 2002-03-05 10:02:52 by bitRAKE
Thanks, but shouldn't that 0 be -1 ???

/Delight
Posted on 2002-03-05 10:11:10 by Delight

Thanks, but shouldn't that 0 be -1 ???

/Delight
No.

Z AND -1 = Z
Z AND 0 = 0 ; you wanted to clear it?

Z OR -1 = -1
Z OR 0 = Z

Z XOR -1 = NOT Z
Z XOR 0 = Z
Posted on 2002-03-05 10:14:19 by bitRAKE
Ok, now I get it. Thanks alot!

/Delight
:stupid:
Posted on 2002-03-05 10:18:52 by Delight
Wouldn't it be better to move zero to the memory location?
It would avoid a read-modify-write operation.

Mirno
Posted on 2002-03-05 10:43:12 by Mirno

Wouldn't it be better to move zero to the memory location?
It would avoid a read-modify-write operation.

Mirno
Yes, if there is an access of that memory location afterward,
otherwise it wouldn't matter on the Athlon. It also effects
the flags, so it might be better to just move the zero. :)
Posted on 2002-03-05 11:39:47 by bitRAKE
Hi !

One more FPU optimization is done by the following steps:

i) start your fpu-block at a boundary of 16 bytes, means at addresses like \$42010, \$64df0, ...

ii) if a fpu-instructions lead over a 16-byte-boundary (at example a 5 byte-operation starts at \$4201e) insert nop-fillins (or integer-code which can be done simultanious) so that the fpu-instruction starts at the next 16-byte-boundary.

this helps because instructions are fetched by 16-byte-blocks ...

Greetings, Caleb
Posted on 2002-03-05 15:59:48 by Caleb
NO, No, No
Remember the pentium chip has pairing:

mov ecx, 0
xor eax, eax
mov dword ptr , eax
mov ecx, 0

is faster than:

mov ecx, 0
mov dword ptr buffer, 0
mov ecx, 0

Just remember to alternate reisters.

And check these out:

mov eax, dword ptr buffer
push eax
call empty
pop eax

mov eax, 0
push eax
call empty
pop eax
Posted on 2002-03-06 20:44:05 by bdjames