I just noticed that on my core2duo fstcw returns 27Fh (precision = 53 bits) at app-start.
Meanwhile on an AthlonXP it returns 37Fh (precision = 64 bits) at app-start. Both systems are WinXP SP2.
Could someone point me to info what the default CW is governed by? (my googling skills fail me this time) I grew up on AMD cpus, remembered that precision by default is at max, and these days needed some extreme FP precision - when I noticed this c2d's default setting. Hopefully I or some drivers simply messed-up some registry settings; or that the precision-setting is outdated and it's always internally calculated at max.
Meanwhile on an AthlonXP it returns 37Fh (precision = 64 bits) at app-start. Both systems are WinXP SP2.
Could someone point me to info what the default CW is governed by? (my googling skills fail me this time) I grew up on AMD cpus, remembered that precision by default is at max, and these days needed some extreme FP precision - when I noticed this c2d's default setting. Hopefully I or some drivers simply messed-up some registry settings; or that the precision-setting is outdated and it's always internally calculated at max.
This should normally be controlled by the OS (but can be modified by various libraries, DirectX was notorious for forcing the application to use single precision, other libs may force long precision).
In Windows, 53 bit precision is the default, see this page for more info:
http://msdn.microsoft.com/en-us/library/y0ybw9fy(v=VS.80).aspx
In Windows, 53 bit precision is the default, see this page for more info:
http://msdn.microsoft.com/en-us/library/y0ybw9fy(v=VS.80).aspx
Interesting - so the newer languages which perform their own linking are stuck with 53 bit precision..
53 bits ought to be enough for anybody. ~Bill Gates
Interesting - so the newer languages which perform their own linking are stuck with 53 bit precision..
No, because they can still call the _controlfp() API function (as MSDN says, through PInvoke with .net), or write their own code to perform fstcw.
However, this is mostly theoretical, as most newer languages only support single and double precision datatypes anyway. A result of most modern CPUs only supporting single and double precision, things like 80-bit precision on x86 and 96-bit precision on 68k hardware are legacy-only... Intel/AMD opted not to implement an 80-bit mode in SSE, yet they advise to use SSE instead of x87, and x87 is deprecated in long mode.
IIRC, in Java floating point precision is strictly enforced, meaning that the compiler/JVM should always generate the exact same results. This is different from eg C++, where a compiler may optimize certain operations, so you get some 'free' extra precision because operations are done directly on stack, as opposed to writing the result out to a single or double precision memory variable, and reading it back.
Ultrano, perhaps you have used finit when you tested with your AthlonXP? Here I've tried with the following program in an AMD Athlon64 WinXP SP3:
But it shown this results:
I've also tried in a Win2k3 with an AMD Athlon (K7, not XP), but still same results.
I have attached the program in case you want to test.
include 'win32axp.inc'
start:
xor eax, eax
push eax
fstcw
finit
push eax
fstcw
invoke wsprintf, buff, <"finit CW = %X", 13, 10, "app-start CW = %X">
add esp, 4*4
invoke MessageBox, 0, buff, "FPU Control Word", 0
invoke ExitProcess, 0
buff rb 256
.end start
But it shown this results:
---------------------------
FPU Control Word
---------------------------
finit CW = 37F
app-start CW = 27F
I've also tried in a Win2k3 with an AMD Athlon (K7, not XP), but still same results.
I have attached the program in case you want to test.
Intel/AMD opted not to implement an 80-bit mode in SSE, yet they advise to use SSE instead of x87, and x87 is deprecated in long mode.
Higher precisions can be achieved with various numerical methods. And the main target for high precision are mathematicians who prefer to optimize their algorithm to reduce its complexity by 10 orders of magnitude, instead of optimizing thieir code to gain 20-50%. The main target for sheer power are game engines that don't need anything more than 64-bit. I guess that's why we won't see high precision SIMD anywhere soon.
In C# both floats and doubles are defined as 32-bit and 64-bit, respectively, and are reqired to comply with IEEE-754, regardless of the underlying hardware. Too bad there is no SIMD supoprt in C# and/or .NET yet :/
Intel/AMD opted not to implement an 80-bit mode in SSE, yet they advise to use SSE instead of x87, and x87 is deprecated in long mode.
Higher precisions can be achieved with various numerical methods. And the main target for high precision are mathematicians who prefer to optimize their algorithm to reduce its complexity by 10 orders of magnitude, instead of optimizing thieir code to gain 20-50%. The main target for sheer power are game engines that don't need anything more than 64-bit. I guess that's why we won't see high precision SIMD anywhere soon.
In C# both floats and doubles are defined as 32-bit and 64-bit, respectively, and are reqired to comply with IEEE-754.
My point exactly, so SpooK wasn't too far off with "53 bits ought to be enough for anybody. ~Bill Gates".
It's just that Bill Gates isn't the only one who thinks that way. Both software and hardware engineers have long dropped any special precision support, and stick only to IEEE-754 32-bit and 64-bit datatypes now (or in some cases, even lower, like the 16-bit and 24-bit float types in Direct3D 9, all in the name of performance).
In the case of mathematicians... if they need more than double precision, they probably want a LOT more than double precision, so the move from 64-bit to 80-bit datatypes is probably not relevant enough.
In the case of regular 'application programmers'... if you need more than double precision, most likely you're doing it wrong, and you might want to consult a mathematician to help you improve the stability of your equations within the 'limits' of double-precision.
For a large part, game engines actually use single-precision. Direct3D 10 and earlier only supports single-precision datatypes period, and while OpenGL pretends to support double-precision datatypes, in most cases this is fake, since the underlying hardware is designed for games and Direct3D, and doesn't implement double-precision.
Only the most recent GPUs (late DX10 and DX11 models) are capable of double-precision, but that is mostly for GPGPU, although Shader Model 5 does allow you to use double-precision variables in your code (but not as global variables). I just don't see many games making use of it, since single-precision has never really been an issue in most games.
Thanks, guys! Especially LocoDelAssembly.
I had someone else check fstcw on the Athlon, and he probably did use "finit" beforehand; thus the difference. yes, he did call finit
(the test code was placed just after start: in asm projects on both sides, no CRT involved)
On this c2d, app-start is 27Fh and finit makes it 37Fh.
Anyway, 53 bits were enough for my calculations, but barely - it was a scientific app.
I had someone else check fstcw on the Athlon, and he probably did use "finit" beforehand; thus the difference. yes, he did call finit
(the test code was placed just after start: in asm projects on both sides, no CRT involved)
On this c2d, app-start is 27Fh and finit makes it 37Fh.
Anyway, 53 bits were enough for my calculations, but barely - it was a scientific app.