Hi.
I need to truncate a floating point number (a double actually) to n bits of precision (ie: clearing the low order 53-n bits of the mantissa).
Is it posible to work with floats at the bit level (not as the number they represent)?
Because I can't load a fouble into a reg....Maybe loading the forst half, and then the second one, and then ANDing with the appropiate mask? Would that work?
Any ideas/code/links?
Thanks.
Posted on 2003-09-02 20:24:29 by GogetaSSJ4
Loading the first, then the second would probably work, I can't see why it wouldn't. You could do the entire operation in mmx regs also.
Posted on 2003-09-02 20:40:43 by Eóin
According to your interpretation of "truncating" (ie: clearing the low order 53-n bits of the mantissa), this would effectively keep only the exponent portion of the REAL8 value. You could do that with the fxtract instruction and then store the unbiased exponent as an integer.

If keeping only the exponent portion of the REAL8 value was NOT what you intended to achieve with your "truncating", you should clarify further what is your intended goal. We may then be in a better position to help you,

Raymond
Posted on 2003-09-02 22:05:04 by Raymond
This is what i did:


double truncar(double entrada, unsigned __int64 mask) {
double retv = 0;
__asm {
// movq MM1, entrada;
movq MM6, entrada;
// pand MM1, mask;

// movq retv, MM1;
movq retv, MM6;
// movd eax, MM1;
// psrlq MM1, 32;
// movd edx, MM1;
}
return(entrada);
}


The problem is that calling
truncar(e, 2)
(e being exp(1)) doesn't work =( it returns
e = -1.#IND00
...I just don't get it.
Any ideas why?
Thanks.

PS: the idea is to truncate a double by clearing its lower-order bits (given in mask, for example:
0xFFFC000000000000
would leave only 2 bits)
Posted on 2003-09-03 00:17:40 by GogetaSSJ4

According to your interpretation of "truncating" (ie: clearing the low order 53-n bits of the mantissa), this would effectively keep only the exponent portion of the REAL8 value. You could do that with the fxtract instruction and then store the unbiased exponent as an integer.

If keeping only the exponent portion of the REAL8 value was NOT what you intended to achieve with your "truncating", you should clarify further what is your intended goal. We may then be in a better position to help you,

Raymond

Sure.
The idea is to truncate the mantissa, while keeping the exponent and sign. The doubles are like this:
1 bit sign
11 bits exponent
52 bits mantissa
So what I need is to clear the (actually) 52-n low order bits. This would keep the sign, exponent and higher n bits of the mantissa.
The code I posted above does just that, but the problem is that it doesn't return the right value, even when just before the return, the retv variable holds that value.
Posted on 2003-09-03 00:35:03 by GogetaSSJ4
Hi,

Remember that the mmx registers are aliased to the fpu registers. You need to add an emms instruction before the return.
Posted on 2003-09-03 01:10:06 by Dr. Manhattan

Hi,

Remember that the mmx registers are aliased to the fpu registers. You need to add an emms instruction before the return.

Yesss, that worked =)
Why is that necessary?
Thanks a lot.
Posted on 2003-09-03 01:25:00 by GogetaSSJ4


Yesss, that worked =)
Why is that necessary?
Thanks a lot.

I just read the docs, thanks again.
Posted on 2003-09-03 01:29:09 by GogetaSSJ4
You're welcome. Here is a possible solution :



double truncar(double entrada, unsigned __int64 mask) {
double retval;
__asm {
movq MM0, mask ;// mm0 = mask
pandn MM0, entrada ;// mm0 = entrada & ~ mask
movq retval, MM0
emms
fld retval
}
}
Posted on 2003-09-03 01:44:30 by Dr. Manhattan