Whats the fastest way of cleaning the FPU stack up?
A loop with "fstp"?
Posted on 2002-12-10 18:39:09 by x86asm
finit
Posted on 2002-12-10 18:45:47 by Qages
Whoa! I promise you, almost anything will be faster than that ;)
My money's on EMMS (sets FPU tag word to 'all empty')
Posted on 2002-12-10 18:57:08 by Jan Wassenberg
EMMS is a killer instruction in terms of CPU time, I wish I could use FEMMS but not all CPU's are made by AMD (Even though I have an Athlon)
Would iut be better to store the tag word, modify it and store it back?
Posted on 2002-12-10 19:01:43 by x86asm
Doh! My AMD slant is becoming apparent ;)
Seriously, though, what's the target processor? Definitely avoid EMMS on MMX Pentiums, but there's no problem on P6 / Athlons (XP: basically a NOP with effective latency = 2 clocks).
If you can hide the delay with integer code, even better.
Posted on 2002-12-10 19:14:06 by Jan Wassenberg
Also on AMD you can use FFREEP to pop the stack as it gets converted to NOPs internally. IIRC, FFREEP is availible on all Pentium+ CPUs, but only documented for AMD processors. :) Usually, a couple FFREEP at the end of your proc will keep the stack level (i.e. rarely a need to clear the whole thing).
Posted on 2002-12-10 20:32:26 by bitRAKE
finit is fast on mah machine rdtsc saz so
Posted on 2002-12-10 21:04:34 by Qages
How many clks does FINIT take and what is your CPU Qages? I think FFREEP is good thanks BitRake.
Posted on 2002-12-10 21:08:35 by x86asm
FFREEP doesnt assemble for me
Posted on 2002-12-10 21:16:25 by Qages

FFREEP doesnt assemble for me
FFREEP st(i) = DF C0+i

For example:

FFREEP st(1)

would assemble to:

db 0DFh, 0C1h



p.s. MASM supports FFREEP, I use the following:
	.686p

.MMX
.K3D ; most likely this?
.XMM
I'm not sure what turns it on?
Posted on 2002-12-10 21:30:59 by bitRAKE
it comes up as ??? in msvc++ debgger, the db 0DFh, 0C1h, for the FFREEP. well i give up in testing its speed.
Posted on 2002-12-10 22:35:24 by Qages
OllyDbg sees FFREEP okay.

An Athlon can do 3 FFREEP instructions per cycle (no dependancies, two cycle latency). :)

ffreep st(7)
ffreep st(5)
ffreep st(3)
ffreep st(1)

; this clears the stack possitions (4 marked empty, 4 pop'd)! :grin:

...or this might compress better:

;clear 8
ffreep st(4)
ffreep st(4)
ffreep st(4)
ffreep st(4)

;clear 6
ffreep st(3)
ffreep st(3)
ffreep st(3)

;clear 4
ffreep st(2)
ffreep st(2)


This is better than using FCOMPP, but use EMMS if clearing the whole stack.
Posted on 2002-12-10 22:42:27 by bitRAKE

OllyDbg sees FFREEP okay.

An Athlon can do 3 FFREEP instructions per cycle (no dependancies, two cycle latency). :)

ffreep st(7)
ffreep st(5)
ffreep st(3)
ffreep st(1)

; this clears the stack possitions (4 marked empty, 4 pop'd)! :grin:

...or this might compress better:

;clear 8
ffreep st(4)
ffreep st(4)
ffreep st(4)
ffreep st(4)

;clear 6
ffreep st(3)
ffreep st(3)
ffreep st(3)

;clear 4
ffreep st(2)
ffreep st(2)


This is better than using FCOMPP, but use EMMS if clearing the whole stack.


2 CYCLES!?!? wow, thatsd pretty good! That makes me happy I bought an Athlon hehe :D
Posted on 2002-12-11 18:21:50 by x86asm