Hi all,
To compute the fraction of a float with SSE, I'm trying to use the following:
This corresponds almost completely to y = x - floor(x). Unfortunately, it doesn't work correctly for x = 1.0. The output should be 0.0 but it's 1.0.
So I was wondering whether anyone knew a way to make it behave exactly like y = x - floor(x).
Thanks!
To compute the fraction of a float with SSE, I'm trying to use the following:
// HALF = 0.5
// MAGIC = 12582912.0 (2^23 + 2^22, forcing fraction bits out of the mantissa)
movss xmm0, x
subss xmm0, HALF
addss xmm0, MAGIC
subss xmm0, MAGIC
movss xmm1, x
subss xmm1, xmm0
movss y, xmm1
This corresponds almost completely to y = x - floor(x). Unfortunately, it doesn't work correctly for x = 1.0. The output should be 0.0 but it's 1.0.
So I was wondering whether anyone knew a way to make it behave exactly like y = x - floor(x).
Thanks!
SSE
A bit optimized, I guess:
FPU
movss xmm0,x
movss xmm1,xmm0
subss xmm0,HALF
cvtss2si eax,xmm0
cvtsi2ss xmm0,eax
subss xmm1,xmm0
movss y,xmm1
A bit optimized, I guess:
movss xmm0,x
subss xmm0,HALF
cvtss2si eax,xmm0
movss xmm1,x
cvtsi2ss xmm0,eax
subss xmm1,xmm0
movss y,xmm1
FPU
fld x
fld ST
fsub HALF
frndint
fsub
fstp y
I know the float->int and int->float conversion instructions, but unfortunately they are quite slow. For vectors it also requires extra movhlps instructions. The approach with the 'MAGIC' number keeps everything in the floating-point pipeline and is very compact. So I was hoping that maybe someone knew how to correct it for x = 1.0, preferably without extra overhead. Maybe I'm asking for the impossible but it would be really nice for the applications I'm working on to have a really fast way to compute a float's fraction.
You were almost there you just had to do two extra checks
1- Make sure the result is 1 it gets changed to 0
2- Mask out those pesky negative values
Here's a packed single FP vector version
It should work for all positive and negative numbers.
x-Floor(x)
1- Make sure the result is 1 it gets changed to 0
2- Mask out those pesky negative values
Here's a packed single FP vector version
.data
align 16
MAGIC dd 12582912.0, 12582912.0, 12582912.0, 12582912.0
HALF dd 0.5, 0.5, 0.5, 0.5
ONES dd 1.0, 1.0, 1.0, 1.0
MASKSIGN dd 7FFFFFFFh, 7FFFFFFFh, 7FFFFFFFh, 7FFFFFFFh
.code
;;in XMM0 out XMM0
Fraction:
ANDPS XMM0, DQWORD
MOVAPS XMM2, DQWORD
MOVAPS XMM5, DQWORD
MOVAPS XMM3, XMM0
MOVAPS XMM1, DQWORD
SUBPS XMM3, XMM2
ADDPS XMM3, XMM1
SUBPS XMM3, XMM1
SUBPS XMM0, XMM3
MOVAPS XMM4, XMM0
CMPPS XMM4, XMM5, 100b ;!= 1
PAND XMM0, XMM4
It should work for all positive and negative numbers.
x-Floor(x)
That's awesome, thanks!
I had to remove the sign masking to make it work with negative arguments though. I'm trying to get the behaviour of the C floor function. I also don't fully understand why you load HALF and ONES into registers. They're only used once so you can use them directly as source operands and save a few instructions (and registers). When executing it in a loop it's obviously best to load them from memory into registers first...
I had to remove the sign masking to make it work with negative arguments though. I'm trying to get the behaviour of the C floor function. I also don't fully understand why you load HALF and ONES into registers. They're only used once so you can use them directly as source operands and save a few instructions (and registers). When executing it in a loop it's obviously best to load them from memory into registers first...