While reading Raymond's FPU tutorial ( http://www.website.masmforum.com/tutorials/fptute/fpuchap7.htm#fxam ) I noticed the "fwait" used to ensure synchronization. However I thought that FPU architecture, as well as x86 one, was synchronous by default until multiple threads are used.
I admit a bit confused seeing this :
Does it mean that I have to care for synchronization each time I do this ?
I admit a bit confused seeing this :
.data
criteria dt 3.3333e-15 ;could be initialized or set by the program
temp_var dt ? ;could be initialized or set by the program
.code
fld temp_var ;load previous value
;=> ST(0)=previous value, ST(1)=current value
fsub st,st(1) ;difference with current value
fabs ;get the absolute value of the difference
fld criteria ;load the criteria
;=> ST(0)=criteria, ST(1)=abs(difference), ST(2)=current value
fcompp ;compare the criteria to the difference
;and discard both values from the FPU
;=> ST(0)=current value
fstsw ax ;retrieve comparison result in the AX register
fwait ;insure the previous instruction is completed
sahf ;transfer the condition codes to the CPU's flag register
;In this type of code, the computed values should already have been verified
;to be valid numbers. Their difference should thus be a valid number, as well
;as the criteria. Therefore no need to check for an indeterminate comparison.
ja criteria_greater ;criteria was ST(0) for comparison
jb criteria_lower
jz criteria_equal
Does it mean that I have to care for synchronization each time I do this ?
fld dword ptr
fild dword ptr
fmulp ST(1),ST
fwait ; Is this needed ?
fistp dword ptr
The fwait in your example is definitely NOT needed. The FPU will not execute any instruction which depends on the result of previous instructions.
The fwait instruction is used in examples of the tutorial before the CPU must use data stored by the FPU. Although modern computers most probably have fully synchronized CPU/FPU systems and the fwait may not be necessary anymore, it is still suggested as a precaution.
Raymond
The fwait instruction is used in examples of the tutorial before the CPU must use data stored by the FPU. Although modern computers most probably have fully synchronized CPU/FPU systems and the fwait may not be necessary anymore, it is still suggested as a precaution.
Raymond
Thanks for the quick answer Raymond ;)
I now understand that the fwait I used in my example was misplaced.
Here is where I placed fwait everywhere in my code :
But I wonder, is it recommended to use Fwait into time critical pieces of code ? I mean, how many cycles does it take ?
I now understand that the fwait I used in my example was misplaced.
Here is where I placed fwait everywhere in my code :
[...]
fistp dword ptr
fwait
add eax, dword ptr ; Synchronise CPU/FPU
[...]
But I wonder, is it recommended to use Fwait into time critical pieces of code ? I mean, how many cycles does it take ?
iirc, the FWAIT was necessary on prehistoric x86 cpus, where the FPU was on a separate chip. I never use this instruction. And I absolutely never had problems (while my projects rely heavily on FPU). But I could be wrong. Anyway, I had tried some time ago to put it in my code, and did benchmarks - the fwait was either not necessary, or slowing down my code with 1 cycle (tests done on a K6-2).
FWAIT raises any pending FPU exceptions. That's all it does. FPU excpetions don't get raised when they occur (as opposed to CPU's exceptions). You use "FWAIT" to raise any pending (queued) exceptions. FPU exceptions are so unimportant that everyone masks them all out and forgets about them. So you don't see many 'fwaits' in a 'modern' code. And on top of that: SSE supersede FPU, which is being kept only for compatibility. IMHO - unnecessarily.
FPU exceptions are so unimportant that everyone masks them all out and forgets about them.
FPU exceptions can be handled either by the FPU itself (with default actions) or by the programmer with error handling code. In its initialized state, the FPU assumes that it will handle exceptions and all of them are masked (nobody else has to mask them). If the programmer wants to handle some exceptions with special code, the programmer must "unmask" whichever exceptions will be covered with additional specific code.
Regardless of those masks, ALL exceptions are recorded in the Status Word in a cumulative way. Any good programmer would inspect that Status Word whenever there is a possibility that an important exception would have been raised and handled by the FPU, before proceeding further with potential garbage data.
BTW, does SSE handle the full 80-bit range of the FPU or is it limited to the 32-bit (or 64-bit) type??
Can SSE simulate ALL the FPU instructions??
(I am aware of some of the advantages of SSE (and MMX) but the main question is "can they fully replace the FPU".)
Raymond