Hi:

I was wondering what would be the fastest way to add up a single xmm register.
e.g.
A xmm register contains 4 single floating pt values.
x3,x2,x1,x0
How would you get the result of x3+x2+x1+x0
If i recall SSE3 has a faster way for doing this 'horizontal?' addition? But I need the code in SSE (not SSE2 etc.)


Cheers
Posted on 2006-04-01 11:18:04 by Raedwulf
SSE3 has "HADDPS". If using SSE2 you have to do additions interleaved with shuffles.
Posted on 2006-04-03 04:22:43 by ti_mo_n
Yeah :)  I needed SSE though. However, i got the answer on the FASM forums here.

http://board.flatassembler.net/topic.php?t=5037

Cheers.
Posted on 2006-04-03 10:15:54 by Raedwulf