Today I started to take a look at MMX coding, and everything is going fine. Until now it's just one thing that I don't understand. How does packing and unpacking work? How/when should I use it?

I would appreciate an example that shows how the single bytes are changing when packing/unpacking. I think it's much easier to understand that way.

Posted on 2002-10-14 15:18:59 by Delight
Often the pack/unpack instructions are used to move data around - lining it up for other operations.
In the folowing code that is the case:
; [ A B C D ]

; [ E F G H ] X [ W X Y Z ] = [ AW+BX+CY+DZ EW+FX+GY+HZ IW+JX+KY+LZ ? ]
; [ I J K L ]

; 16 bit numbers are scaled to a fixed point size of:
; 1.111 1111 1111 1111 ; first bit is sign bit
NUMBER_SCALE EQU 15 ; 1 / 2^15

pMatrix EQU [esp + 8] ; 4x3 transform matrix pointer
pVector EQU [esp + 12] ; source vectors pointer
iNumVec EQU [esp + 16] ; number of vectors to transform
pResult EQU [esp + 20] ; destination for transformed vectors

mov ecx,iNumVec
mov eax,pMatrix
lea edx,[ecx*8] ; size of source/dest vector buffer
neg ecx

; load entire 3x4 matrix
movq mm0,[eax + 0]
movq mm1,[eax + 8]
movq mm2,[eax + 16]

mov eax,edx
add edx,pVector
add eax,pResult
; Load vector (4 16-bit elements) into reg
movq mm3,[edx + ecx*8]
inc ecx

movq mm4,mm3 ;copy to other regs for use by 3 pmadds
pmaddwd mm3,mm0 ;multiply row0 X vector

movq mm5,mm4
pmaddwd mm4,mm1 ;multiply row1 X vector

movq mm6,mm3 ; A1 A2
pmaddwd mm5,mm2 ;multiply row2 X vector

punpckldq mm3,mm4 ; B2 A2
punpckhdq mm6,mm4 ; B1 A1

movq mm4,mm5 ;add row2 high and low order 32-bit results
punpckhdq mm5,mm5 ; psrlq mm5,32

paddd mm3,mm6 ; B1+B2 A1+A2
paddd mm5,mm4

psrad mm3,NUMBER_SCALE-2
psrad mm5,NUMBER_SCALE-2

packssdw mm3,mm5 ; pack dwords into words
; might need to mask off high word of MMX reg?
movq [eax + ecx*8 - 8],mm3 ; store resulting vector

jnz NextVect ;then loop back to do the next one.
Other times you will need to expand words/bytes to dwords/words to perform operations on them - in this case the second operand is zero to clear the upper part (for unsigned) or second operand is the data to expand and an arithmic shift is performed to sign extend the data:
; unsigned unpack of low words into dwords

pxor mm7, mm7 ; zero
punpcklwd mm0, mm7

; signed unpack of low words into signed dwords
punpcklwd mmY, mm0 ; first operand can be any MMX register
psard mmY, 16 ; fill upper word with sign
Of course an unsigned shift could be used for the unsigned unpack, but in a loop you'll want to use a spare register for zero if you have one to reduce the unpack to one instruction.
Posted on 2002-10-14 20:46:19 by bitRAKE