Hi.

I need to truncate a floating point number (a double actually) to n bits of precision (ie: clearing the low order 53-n bits of the mantissa).

Is it posible to work with floats at the bit level (not as the number they represent)?

Because I can't load a fouble into a reg....Maybe loading the forst half, and then the second one, and then ANDing with the appropiate mask? Would that work?

Any ideas/code/links?

Thanks.

I need to truncate a floating point number (a double actually) to n bits of precision (ie: clearing the low order 53-n bits of the mantissa).

Is it posible to work with floats at the bit level (not as the number they represent)?

Because I can't load a fouble into a reg....Maybe loading the forst half, and then the second one, and then ANDing with the appropiate mask? Would that work?

Any ideas/code/links?

Thanks.

Loading the first, then the second would probably work, I can't see why it wouldn't. You could do the entire operation in mmx regs also.

According to your interpretation of "truncating" (ie: clearing the low order 53-n bits of the mantissa), this would effectively keep only the exponent portion of the REAL8 value. You could do that with the

If keeping only the exponent portion of the REAL8 value was NOT what you intended to achieve with your "truncating", you should clarify further what is your intended goal. We may then be in a better position to help you,

Raymond

**fxtract**instruction and then store the unbiased exponent as an integer.If keeping only the exponent portion of the REAL8 value was NOT what you intended to achieve with your "truncating", you should clarify further what is your intended goal. We may then be in a better position to help you,

Raymond

This is what i did:

The problem is that calling

Any ideas why?

Thanks.

PS: the idea is to truncate a double by clearing its lower-order bits (given in mask, for example:

```
```

double truncar(double entrada, unsigned __int64 mask) {

double retv = 0;

__asm {

// movq MM1, entrada;

movq MM6, entrada;

// pand MM1, mask;

// movq retv, MM1;

movq retv, MM6;

// movd eax, MM1;

// psrlq MM1, 32;

// movd edx, MM1;

}

return(entrada);

}

The problem is that calling

`truncar(e, 2)`

(e being exp(1)) doesn't work =( it returns `e = -1.#IND00`

...I just don't get it.
Any ideas why?

Thanks.

PS: the idea is to truncate a double by clearing its lower-order bits (given in mask, for example:

`0xFFFC000000000000`

would leave only 2 bits)According to your interpretation of "truncating" (ie: clearing the low order 53-n bits of the mantissa), this would effectively keep only the exponent portion of the REAL8 value. You could do that with the

**fxtract**instruction and then store the unbiased exponent as an integer.

If keeping only the exponent portion of the REAL8 value was NOT what you intended to achieve with your "truncating", you should clarify further what is your intended goal. We may then be in a better position to help you,

Raymond

Sure.

The idea is to truncate the mantissa, while keeping the exponent and sign. The doubles are like this:

1 bit sign

11 bits exponent

52 bits mantissa

So what I need is to clear the (actually) 52-n low order bits. This would keep the sign, exponent and higher n bits of the mantissa.

The code I posted above does just that, but the problem is that it doesn't return the right value, even when just before the return, the retv variable holds that value.

Hi,

Remember that the mmx registers are aliased to the fpu registers. You need to add an emms instruction before the return.

Remember that the mmx registers are aliased to the fpu registers. You need to add an emms instruction before the return.

Hi,

Remember that the mmx registers are aliased to the fpu registers. You need to add an emms instruction before the return.

Yesss, that worked =)

Why is that necessary?

Thanks a lot.

Yesss, that worked =)

Why is that necessary?

Thanks a lot.

I just read the docs, thanks again.

You're welcome. Here is a possible solution :

```
```

double truncar(double entrada, unsigned __int64 mask) {

double retval;

__asm {

movq MM0, mask ;// mm0 = mask

pandn MM0, entrada ;// mm0 = entrada & ~ mask

movq retval, MM0

emms

fld retval

}

}