While reading Raymond's FPU tutorial ( http://www.website.masmforum.com/tutorials/fptute/fpuchap7.htm#fxam ) I noticed the "fwait" used to ensure synchronization. However I thought that FPU architecture, as well as x86 one, was synchronous by default until multiple threads are used.
I admit a bit confused seeing this :


  criteria  dt 3.3333e-15 ;could be initialized or set by the program
  temp_var  dt ?          ;could be initialized or set by the program


  fld  temp_var    ;load previous value
                    ;=> ST(0)=previous value, ST(1)=current value
  fsub  st,st(1)    ;difference with current value
  fabs              ;get the absolute value of the difference
  fld  criteria    ;load the criteria
                    ;=> ST(0)=criteria, ST(1)=abs(difference), ST(2)=current value
  fcompp            ;compare the criteria to the difference
                    ;and discard both values from the FPU
                    ;=> ST(0)=current value
  fstsw ax          ;retrieve comparison result in the AX register
  fwait            ;insure the previous instruction is completed
  sahf              ;transfer the condition codes to the CPU's flag register

;In this type of code, the computed values should already have been verified
;to be valid numbers. Their difference should thus be a valid number, as well
;as the criteria. Therefore no need to check for an indeterminate comparison.

  ja    criteria_greater ;criteria was ST(0) for comparison
  jb    criteria_lower
  jz    criteria_equal

Does it mean that I have to care for synchronization each time I do this ?

    fld  dword ptr                      
    fild  dword ptr
    fmulp ST(1),ST
    fwait                                  ; Is this needed ?
    fistp dword ptr
Posted on 2006-09-13 04:44:16 by Axial
The fwait in your example is definitely NOT needed. The FPU will not execute any instruction which depends on the result of previous instructions.

The fwait instruction is used in examples of the tutorial before the CPU must use data stored by the FPU. Although modern computers most probably have fully synchronized CPU/FPU systems and the fwait may not be necessary anymore, it is still suggested as a precaution.

Posted on 2006-09-13 21:17:39 by Raymond
Thanks for the quick answer Raymond ;)

I now understand that the fwait I used in my example was misplaced.
Here is where I placed fwait everywhere in my code :

    fistp dword ptr
    add  eax, dword ptr   ; Synchronise CPU/FPU

But I wonder, is it recommended to use Fwait into time critical pieces of code ? I mean, how many cycles does it take ?
Posted on 2006-09-14 05:57:31 by Axial
iirc, the FWAIT was necessary on prehistoric x86 cpus, where the FPU was on a separate chip. I never use this instruction. And I absolutely never had problems (while my projects rely heavily on FPU). But I could be wrong. Anyway, I had tried some time ago to put it in my code, and did benchmarks - the fwait was either not necessary, or slowing down my code with 1 cycle (tests done on a K6-2).
Posted on 2006-09-14 12:14:17 by Ultrano
FWAIT raises any pending FPU exceptions. That's all it does. FPU excpetions don't get raised when they occur (as opposed to CPU's exceptions). You use "FWAIT" to raise any pending (queued) exceptions. FPU exceptions are so unimportant that everyone masks them all out and forgets about them. So you don't see many 'fwaits' in a 'modern' code. And on top of that: SSE supersede FPU, which is being kept only for compatibility. IMHO - unnecessarily.
Posted on 2006-09-17 15:39:24 by ti_mo_n

FPU exceptions are so unimportant that everyone masks them all out and forgets about them.

FPU exceptions can be handled either by the FPU itself (with default actions) or by the programmer with error handling code. In its initialized state, the FPU assumes that it will handle exceptions and all of them are masked (nobody else has to mask them). If the programmer wants to handle some exceptions with special code, the programmer must "unmask" whichever exceptions will be covered with additional specific code.

Regardless of those masks, ALL exceptions are recorded in the Status Word in a cumulative way. Any good programmer would inspect that Status Word whenever there is a possibility that an important exception would have been raised and handled by the FPU, before proceeding further with potential garbage data.

BTW, does SSE handle the full 80-bit range of the FPU or is it limited to the 32-bit (or 64-bit) type??
Can SSE simulate ALL the FPU instructions??

(I am aware of some of the advantages of SSE (and MMX) but the main question is "can they fully replace the FPU".)

Posted on 2006-09-17 21:27:32 by Raymond