Posted on 2005-04-14 22:45:14 by luisvalencia
1 = 1*10^0

213 = 2.13*10^2

bin
1100 = 1.1100*10^100

The only diference between float and double is the biased exponent.
Posted on 2005-04-14 23:19:32 by rea
in assembly how can I do it if registers are 32 bits longs not 64 bits?

I must give a flot and it must return a double
Posted on 2005-04-15 00:13:19 by luisvalencia
``REAL8_REAL4 PROC num8:PTR, num4:PTR    mov eax, num4    mov edx, num8    fld REAL4 PTR     fstp REAL8 PTR     retREAL8_REAL4 ENDPMyNum8 REAL8 ?MyNum4 REAL4 3.14invoke REAL8_REAL4, ADDR MyNum8, ADDR MyNum4``
...I leave the explaination to you. :)
Posted on 2005-04-15 01:16:50 by bitRAKE

in assembly how can I do it if registers are 32 bits longs not 64 bits?

I must give a flot and it must return a double

If you can't move a pile of dirt with one shovel, you move it one shovel load at a time.

In addition to adding zeroes to the right of your float, you will need to change the number of bits in the exponent. Without looking it up, I'm guessing there's 11 or 12 bits of exponent in the double format.
Posted on 2005-04-15 14:26:37 by tenkey

look up the fild instruction.
Cheers,
RandyHyde
Posted on 2005-04-15 21:04:05 by rhyde
For those of you who haven't noticed, this is a class assignment in doing fp the hard way.
Posted on 2005-04-16 00:13:15 by tenkey
yep, either that or this guy loves the fpu and wants to know it in detail
Posted on 2005-04-16 12:20:14 by evlncrn8
In decimal scientific notation, there is only ever one digit to the right of the decimal point, the same is true for IEEE floating point mathematics. However in accordance with the IEEE floating point spec, in order to save a bit of storage, there is an implied 1 at the top of the mantissa (effectively all calculations must be made as a 24 bit mantissa were available in the case of a 32 bit float).

In order to convert an integer to a floating point value, we must determine which the index value of our most significant set bit is. This will be a value between 0 and 31 if the initial integer value is non-zero.

Note that as we are converting an integer value to a floating point, we will only ever need to add to the exponent (exponents less than 127 indicate fractional values less than 1). Nor can we express a NaN, or infinite value using a 32 bit integer, so we don't need to worry about these either.

``IF our integer == 0  SIGN     = 0  EXPONENT = 0  MANTISSA = 0ELSE  SIGN     = 0  EXPONENT = 127 + index of most significant set bit  MANTISSA = our integer shifted (left or right) so the most significant bit is in bit position 24.             Bit 24 is promptly thrown away!``

The above is of course assuming that you are dealing with an unsigned integer, and that you want to truncate the integer rather than round. Things get more complicated if you wish to deal with either case.

From now on I won't re-type the code to deal with zeros, it should be considered to be implied.

``  SIGN     = sign of integer value  temp     = ABS(our integer)  index    = the index of temp's most significant set bit  if (index > 24)    temp  += 1 << (index - 24) ; Here we do our rounding!  EXPONENT = 127 + index of most significant set bit of temp  MANTISSA = temp shifted (left or right) so the most significant bit is in bit position 24.             Bit 24 is promptly thrown away!``

The actual code at this point is rather trivial.

Use of one of the bit scan opcodes (bsf or bsr) will be needed.

Several cmps & jmps, to branch based on conditions (code path will be different if integer is zero, if the integer has a bit set higher than position 23 (you need to shift the other way).

One and will be needed to remove the bit 24, post shift.

In the case of the double, the above holds true, except the bias for the exponent is 1023, and it's position is 52..62 (11 bits). Similarly, the mantissa occupies bits 0..51 (52 bits), but this means that no truncation is ever necessary, so you can optimise that section out of your function in this case (a 32 bit value will always fit in 52 bits!!!), and it will only ever be shifted in one direction. However in the case of the double, the mantissa being 52 bits, does not fit within a 32 bit register, so there will be a certain amount of bit fiddleing to do (it should just be shifts, ands and ors).

Really if you can't work out what you are meant to do from this, I'd give up and go home. The next step is to give you the answer verbatim, which will be cheating. I don't want to cheat, you don't want to cheat, nobody here want's to cheat, so that's simply not an option. Perhaps if you still don't understand you should show this to your teacher / professor and ask him for more help.

Mirno
Posted on 2005-04-16 18:22:29 by Mirno