Hi:
I was wondering what would be the fastest way to add up a single xmm register.
e.g.
A xmm register contains 4 single floating pt values.
x3,x2,x1,x0
How would you get the result of x3+x2+x1+x0
If i recall SSE3 has a faster way for doing this 'horizontal?' addition? But I need the code in SSE (not SSE2 etc.)
Cheers
I was wondering what would be the fastest way to add up a single xmm register.
e.g.
A xmm register contains 4 single floating pt values.
x3,x2,x1,x0
How would you get the result of x3+x2+x1+x0
If i recall SSE3 has a faster way for doing this 'horizontal?' addition? But I need the code in SSE (not SSE2 etc.)
Cheers
SSE3 has "HADDPS". If using SSE2 you have to do additions interleaved with shuffles.
Yeah :) I needed SSE though. However, i got the answer on the FASM forums here.
http://board.flatassembler.net/topic.php?t=5037
Cheers.
http://board.flatassembler.net/topic.php?t=5037
Cheers.