Whats the fastest way of cleaning the FPU stack up?
A loop with "fstp"?
A loop with "fstp"?
finit
Whoa! I promise you, almost anything will be faster than that ;)
My money's on EMMS (sets FPU tag word to 'all empty')
My money's on EMMS (sets FPU tag word to 'all empty')
EMMS is a killer instruction in terms of CPU time, I wish I could use FEMMS but not all CPU's are made by AMD (Even though I have an Athlon)
Would iut be better to store the tag word, modify it and store it back?
Would iut be better to store the tag word, modify it and store it back?
Doh! My AMD slant is becoming apparent ;)
Seriously, though, what's the target processor? Definitely avoid EMMS on MMX Pentiums, but there's no problem on P6 / Athlons (XP: basically a NOP with effective latency = 2 clocks).
If you can hide the delay with integer code, even better.
Seriously, though, what's the target processor? Definitely avoid EMMS on MMX Pentiums, but there's no problem on P6 / Athlons (XP: basically a NOP with effective latency = 2 clocks).
If you can hide the delay with integer code, even better.
Also on AMD you can use FFREEP to pop the stack as it gets converted to NOPs internally. IIRC, FFREEP is availible on all Pentium+ CPUs, but only documented for AMD processors. :) Usually, a couple FFREEP at the end of your proc will keep the stack level (i.e. rarely a need to clear the whole thing).
finit is fast on mah machine rdtsc saz so
How many clks does FINIT take and what is your CPU Qages? I think FFREEP is good thanks BitRake.
FFREEP doesnt assemble for me
FFREEP doesnt assemble for me
For example:
FFREEP st(1)
would assemble to:
db 0DFh, 0C1h
p.s. MASM supports FFREEP, I use the following:
.686p
.MMX
.K3D ; most likely this?
.XMM
I'm not sure what turns it on?it comes up as ??? in msvc++ debgger, the db 0DFh, 0C1h, for the FFREEP. well i give up in testing its speed.
OllyDbg sees FFREEP okay.
An Athlon can do 3 FFREEP instructions per cycle (no dependancies, two cycle latency). :)
ffreep st(7)
ffreep st(5)
ffreep st(3)
ffreep st(1)
; this clears the stack possitions (4 marked empty, 4 pop'd)! :grin:
...or this might compress better:
;clear 8
ffreep st(4)
ffreep st(4)
ffreep st(4)
ffreep st(4)
;clear 6
ffreep st(3)
ffreep st(3)
ffreep st(3)
;clear 4
ffreep st(2)
ffreep st(2)
This is better than using FCOMPP, but use EMMS if clearing the whole stack.
An Athlon can do 3 FFREEP instructions per cycle (no dependancies, two cycle latency). :)
ffreep st(7)
ffreep st(5)
ffreep st(3)
ffreep st(1)
; this clears the stack possitions (4 marked empty, 4 pop'd)! :grin:
...or this might compress better:
;clear 8
ffreep st(4)
ffreep st(4)
ffreep st(4)
ffreep st(4)
;clear 6
ffreep st(3)
ffreep st(3)
ffreep st(3)
;clear 4
ffreep st(2)
ffreep st(2)
This is better than using FCOMPP, but use EMMS if clearing the whole stack.
OllyDbg sees FFREEP okay.
An Athlon can do 3 FFREEP instructions per cycle (no dependancies, two cycle latency). :)
ffreep st(7)
ffreep st(5)
ffreep st(3)
ffreep st(1)
; this clears the stack possitions (4 marked empty, 4 pop'd)! :grin:
...or this might compress better:
;clear 8
ffreep st(4)
ffreep st(4)
ffreep st(4)
ffreep st(4)
;clear 6
ffreep st(3)
ffreep st(3)
ffreep st(3)
;clear 4
ffreep st(2)
ffreep st(2)
This is better than using FCOMPP, but use EMMS if clearing the whole stack.
2 CYCLES!?!? wow, thatsd pretty good! That makes me happy I bought an Athlon hehe :D