in nasm syntax, what is the difference between dd, dq, dt which are used to store a float number?

Are they all reserve 4 byte for the float number?

Are they all reserve 4 byte for the float number?

there's three floating-point format on IA-32:

REAL4, (single precision), 'float' in C, takes a dword.

REAL8, (double precision), 'double' in C, takes a qword (64bit) - dq

REAL10,(extended precision), 'long double' extension in C, takes ten bytes - dt

Usually you can get away with using REAL4, unless you need really precise stuff. It's worth noting that, if you tweak the x87 control flag, REAL4 quantities will be faster to calculations with (can't remember the specifics nor whether it's still true with the latest-and-greatest processors, but it means something at least a few years back). Of course there's also the fact that REAL4 is half the size of REAL8, so you can fit the double amount in the same cache... and if you use SSE/2, you can process 4 REAL4s in one of them 128bit xmm registers.

REAL4, (single precision), 'float' in C, takes a dword.

REAL8, (double precision), 'double' in C, takes a qword (64bit) - dq

REAL10,(extended precision), 'long double' extension in C, takes ten bytes - dt

Usually you can get away with using REAL4, unless you need really precise stuff. It's worth noting that, if you tweak the x87 control flag, REAL4 quantities will be faster to calculations with (can't remember the specifics nor whether it's still true with the latest-and-greatest processors, but it means something at least a few years back). Of course there's also the fact that REAL4 is half the size of REAL8, so you can fit the double amount in the same cache... and if you use SSE/2, you can process 4 REAL4s in one of them 128bit xmm registers.

Hello f0dder,

there's three floating-point format on IA-32:

REAL4, (single precision), 'float' in C, takes a dword.

REAL8, (double precision), 'double' in C, takes a qword (64bit) - dq

REAL10,(extended precision), 'long double' extension in C, takes ten bytes - dt

As a matter of fact, 'long double' is standard C since at least 89.. also, these C types are not guaranteed to take that amount of bytes, for example 'long double' takes 8 bytes in VC.

Usually you can get away with using REAL4, unless you need really precise stuff. It's worth noting that, if you tweak the x87 control flag, REAL4 quantities will be faster to calculations with (can't remember the specifics nor whether it's still true with the latest-and-greatest processors, but it means something at least a few years back). Of course there's also the fact that REAL4 is half the size of REAL8, so you can fit the double amount in the same cache... and if you use SSE/2, you can process 4 REAL4s in one of them 128bit xmm registers.

The better practice is to use double-precision for all intermediate calculations.

there's three floating-point format on IA-32:

REAL4, (single precision), 'float' in C, takes a dword.

REAL8, (double precision), 'double' in C, takes a qword (64bit) - dq

REAL10,(extended precision), 'long double' extension in C, takes ten bytes - dt

As a matter of fact, 'long double' is standard C since at least 89.. also, these C types are not guaranteed to take that amount of bytes, for example 'long double' takes 8 bytes in VC.

Usually you can get away with using REAL4, unless you need really precise stuff. It's worth noting that, if you tweak the x87 control flag, REAL4 quantities will be faster to calculations with (can't remember the specifics nor whether it's still true with the latest-and-greatest processors, but it means something at least a few years back). Of course there's also the fact that REAL4 is half the size of REAL8, so you can fit the double amount in the same cache... and if you use SSE/2, you can process 4 REAL4s in one of them 128bit xmm registers.

The better practice is to use double-precision for all intermediate calculations.

As a matter of fact, 'long double' is standard C since at least 89.. also, these C types are not guaranteed to take that amount of bytes, for example 'long double' takes 8 bytes in VC.

Hm, okay... first, I was of course making the assumption of a 32bit platform (since this is a win32 forum ;)), so float+double should be true for all compilers. But you're saying "long double" won't get me an extended-precision FP number? I'm pretty sure there is (or used to be) a way to get hold of one of these, but that it was non-standard C...

The better practice is to use double-precision for all intermediate calculations.

Use doubles if you need precision, don't need speed, or don't care. Use floats if you need speed, or know that you don't need the precision and want to converve memory. Blindly using doubles isn't exactly "better practice".

long double may exist as a datatype in most C/C++ compilers (does it?), but since most FPUs only handle 32 and 64 bit floats, it has no meaning.

And at least in Windows, it is the default to use double precision.

But we should be using SSE/SSE2 instead of x87, really :P

Discussing the fastest x87 code is moot, since SSE/SSE2 is always faster.

And at least in Windows, it is the default to use double precision.

But we should be using SSE/SSE2 instead of x87, really :P

Discussing the fastest x87 code is moot, since SSE/SSE2 is always faster.

Quoting myself:

Of course there's also the fact that REAL4 is half the size of REAL8, so you can fit the double amount in the same cache... and if you use SSE/2, you can process 4 REAL4s in one of them 128bit xmm registers.

...and process them in the same time as if you worked on two doubles, right? So, effectively twice the speed.

Besides, there are still a lot of people with processors even without SSE support, and since these have lower clock frequencies the speed of x87 code matters even more on these.

Of course there's also the fact that REAL4 is half the size of REAL8, so you can fit the double amount in the same cache... and if you use SSE/2, you can process 4 REAL4s in one of them 128bit xmm registers.

...and process them in the same time as if you worked on two doubles, right? So, effectively twice the speed.

Besides, there are still a lot of people with processors even without SSE support, and since these have lower clock frequencies the speed of x87 code matters even more on these.

I personally don't bother about non-SSE systems anymore. Not that I write everything in SSE though. Generally I just use what my compiler cooks up :P

Hello f0dder,

Hm, okay... first, I was of course making the assumption of a 32bit platform (since this is a win32 forum ;)), so float+double should be true for all compilers.

That is incorrect. The only promises the standard gives you are that 'double' value set includes 'float' value set (same for 'long double' and 'double').. that means that sizeof(float)==sizeof(double)==sizeof(long double) can happen on a conforming implementation.

But you're saying "long double" won't get me an extended-precision FP number? I'm pretty sure there is (or used to be) a way to get hold of one of these, but that it was non-standard C...

'long double' is an extended-precision (10 bytes) type on other compilers on the IA32 platform... for example Borland's. AFAIK, there's no way to use extended-precision in VC without resorting to non-C solutions.

Use doubles if you need precision, don't need speed, or don't care. Use floats if you need speed, or know that you don't need the precision and want to converve memory. Blindly using doubles isn't exactly "better practice".

Considering "95% of the folks out there are completely clueless about floating-point."*, it is quite the better practice to use doubles for intermediate calculations. Anyone who has been doing numeric programming for a while can tell you of 'floating-point' horrors, rounding here, rounding there, rounding everywhere. So, if you're not one of the 5% who spend their lifetime looking for rounding problems, and you remotely care about the accuracy of the results, I suggest you stick to doubles.

* Taken from "How Java's Floating-Point Hurts Everyone Everywhere", originally said by J. Gosling, 28 Feb, 1998.

Hm, okay... first, I was of course making the assumption of a 32bit platform (since this is a win32 forum ;)), so float+double should be true for all compilers.

That is incorrect. The only promises the standard gives you are that 'double' value set includes 'float' value set (same for 'long double' and 'double').. that means that sizeof(float)==sizeof(double)==sizeof(long double) can happen on a conforming implementation.

But you're saying "long double" won't get me an extended-precision FP number? I'm pretty sure there is (or used to be) a way to get hold of one of these, but that it was non-standard C...

'long double' is an extended-precision (10 bytes) type on other compilers on the IA32 platform... for example Borland's. AFAIK, there's no way to use extended-precision in VC without resorting to non-C solutions.

Use doubles if you need precision, don't need speed, or don't care. Use floats if you need speed, or know that you don't need the precision and want to converve memory. Blindly using doubles isn't exactly "better practice".

Considering "95% of the folks out there are completely clueless about floating-point."*, it is quite the better practice to use doubles for intermediate calculations. Anyone who has been doing numeric programming for a while can tell you of 'floating-point' horrors, rounding here, rounding there, rounding everywhere. So, if you're not one of the 5% who spend their lifetime looking for rounding problems, and you remotely care about the accuracy of the results, I suggest you stick to doubles.

* Taken from "How Java's Floating-Point Hurts Everyone Everywhere", originally said by J. Gosling, 28 Feb, 1998.

So, if you're not one of the 5% who spend their lifetime looking for rounding problems, and you remotely care about the accuracy of the results, I suggest you stick to doubles.

I prefer f0dder's approach: try and educate the 95% (sticking to doubles won't save your butt in every situation anyway, and you only get extra performance penalties for it).

Hello Scali,

I prefer f0dder's approach: try and educate the 95% (sticking to doubles won't save your butt in every situation anyway, and you only get extra performance penalties for it).

Personally, I don't consider myself part of the 5%, and I also don't assume readers of this thread to be. Maybe you and f0dder feel comfortable making such a suggestion, I sure don't.

I prefer f0dder's approach: try and educate the 95% (sticking to doubles won't save your butt in every situation anyway, and you only get extra performance penalties for it).

Personally, I don't consider myself part of the 5%, and I also don't assume readers of this thread to be. Maybe you and f0dder feel comfortable making such a suggestion, I sure don't.

I am part of the 5%... Computer Graphics/CAD/Computational Geometry/etc tend to bring up lots of numerical code.

So if there's anything you want to know, feel free to ask ;)

So if there's anything you want to know, feel free to ask ;)

That is incorrect. The only promises the standard gives you are that 'double' value set includes 'float' value set (same for 'long double' and 'double').. that means that sizeof(float)==sizeof(double)==sizeof(long double) can happen on a conforming implementation.

Yes, I know

**The Standard**would say something like "sizeof(float) <= sizeof(double) <= sizeof(long double)" - but I'm talking about win32, where I'm pretty sure any compiler will do float=REAL4, double=REAL8. Interesting that VC no longer supports REAL10 (it did in the 16bit days), but then again, who cares. Can't blame you for being pedantic though, I tend to be so myself ;). Btw, seems that AIX has 128bit long double - neat :p

As for your blabla, I quote myself again:

Use doubles if you need precision, don't need speed, or don't care. Use floats if you need speed, or know that you don't need the precision and want to converve memory.

So, "use floats if it makes sense", basically. I tend to use doubles myself for "whatever" stuff, but not when I'm interested in speed. Guess I should have a look at my old broken software renderer and see what kind of speed difference float->double would bring :)

I tend to use floats for everything, unless it turns out that I have to use doubles. Floats already have quite a large range and decent precision, they go a long way :)

Interesting that VC no longer supports REAL10 (it did in the 16bit days)

For those interested, the reason behind this is described in the paper I provided a link for in one of my previous posts in this thread.