byteswapping xmm / sse2 registers (without using BSWAP)

aka, switching little endian to big endian and back.

Intro:

In SSE2 / XMM 128-bit registers, there is no BSWAP command. In fact there is no
way to shuffle bytes directly. You can shuffle quadwords, doublewords, and words,
but not bytes. What if you have 4 32-bit values in an XMM register that you
want to BSWAP?

You could copy repeatedly into the 32-bit registers, then BSWAP, then copy back out
to the 128-bit register.  However...... you can also do it another way, without using
any general purpose 32-bit registers, or BSWAP. Instead you can use SSE2 Shuffle Words
commands.

So: given 1 xmm register (xmm5 here), swap the bytes within the 4 32-bit doublewords inside it.

uses two temporary registers.


       movdqu xmm0, xmm5
       movdqu xmm1, xmm5
       pxor    xmm5, xmm5
       punpckhbw xmm0, xmm5 ; interleave '0' with bytes of original
       punpcklbw xmm1, xmm5 ;  so they become words
       pshuflw xmm0, xmm0, 0b00_01_10_11 ; swap the words by shuffling
       pshufhw xmm0, xmm0, 0b00_01_10_11
       pshuflw xmm1, xmm1, 0b00_01_10_11
       pshufhw xmm1, xmm1, 0b00_01_10_11
       packuswb xmm1, xmm0 ; pack/de-interleave, ie make the words back into bytes.

       movdqu xmm5, xmm1


how it works

in XMM / SSE2, you can't swap bytes. but you can...

1. swap words
2. 'inflate' bytes into words, by interleaving with 0
3. 'deflate' words back into bytes, chopping off the 0

EX:

input 16 bytes / 128-bits:

input register bytes: ABCD EFGH IJKL MNOP

inflate / unpack / interleave with 0: (PUNPCKHBW, PUNPCKLBW)
temp register 1: 0A0B 0C0D 0E0F 0G0H
temp register 2: 0I0J 0K0L 0M0N 0O0P

swap words: (PSHUFLW, PSHUFHW)
temp register 1: 0D0C 0B0A 0H0G 0F0E
temp register 2: 0L0K 0J0I 0P0O 0N0M

deflate / pack / de-interleave (PACKUSWB)
input register bytes: DCBA HGFE LKJI PONM


Bonus:

If you also want to swap the order of doublewords within the 128-bit register,
you can use one PSHUFD.


NASM MACRO:
%macro xmmbswap 3
       movdqu  %3, %1
       movdqu  %2, %1
       pxor    %1, %1
       punpckhbw %3, %1 ; interleave '0' with bytes of original
       punpcklbw %2, %1 ;  so they become words
       pshuflw %3, %3, 0b00_01_10_11 ; swap the words by shuffling
       pshufhw %3, %3, 0b00_01_10_11 ;
       pshuflw %2, %2, 0b00_01_10_11
       pshufhw %2, %2, 0b00_01_10_11
       packuswb %2, %3 ; pack/de-interleave, ie make the words back into bytes.
       movdqu %1, %2
 %endmacro

END

Posted on 2009-12-27 13:46:47 by decora
I took the liberty of splitting your post into multiple text/code parts for easier reading - hope you don't mind :)
Posted on 2009-12-27 15:51:07 by f0dder
how about :

	pshufd xmm5,xmm5,000011011b
pshuflw xmm5,xmm5,10110001b
pshufhw xmm5,xmm5,10110001b
movdqa xmm0,xmm5
psrlw xmm0,8
psllw xmm5,8
por xmm5,xmm0

8)
Posted on 2009-12-27 17:23:03 by drizz