Hi!

Today I started to take a look at MMX coding, and everything is going fine. Until now it's just one thing that I don't understand. How does packing and unpacking work? How/when should I use it?

I would appreciate an example that shows how the single bytes are changing when packing/unpacking. I think it's much easier to understand that way.

Thanks,
Delight
Posted on 2002-10-14 15:18:59 by Delight
Often the pack/unpack instructions are used to move data around - lining it up for other operations.
In the folowing code that is the case:
``````; [ A B C D ]
; [ E F G H ]  X  [ W X Y Z ]  =  [ AW+BX+CY+DZ  EW+FX+GY+HZ  IW+JX+KY+LZ   ? ]
; [ I J K L ]

; 16 bit numbers are scaled to a fixed point size of:
; 1.111 1111 1111 1111 ; first bit is sign bit
NUMBER_SCALE EQU 15 ; 1 / 2^15

pMatrix EQU [esp +  8] ; 4x3 transform matrix pointer
pVector EQU [esp + 12] ; source vectors pointer
iNumVec EQU [esp + 16] ; number of vectors to transform
pResult EQU [esp + 20] ; destination for transformed vectors

mov	ecx,iNumVec
mov	eax,pMatrix
lea	edx,[ecx*8] ; size of source/dest vector buffer
neg	ecx

movq	mm0,[eax +  0]
movq	mm1,[eax +  8]
movq	mm2,[eax + 16]

mov eax,edx
NextVect:
; Load vector (4 16-bit elements) into reg
movq	mm3,[edx + ecx*8]
inc	ecx

movq	mm4,mm3		;copy to other regs for use by 3 pmadds
pmaddwd	mm3,mm0		;multiply row0 X vector

movq	mm5,mm4
pmaddwd	mm4,mm1		;multiply row1 X vector

movq	mm6,mm3		; A1 A2
pmaddwd	mm5,mm2		;multiply row2 X vector

punpckldq mm3,mm4	; B2 A2
punpckhdq mm6,mm4	; B1 A1

movq	mm4,mm5		;add row2 high and low order 32-bit results
punpckhdq mm5,mm5	;	psrlq	mm5,32

packssdw mm3,mm5	; pack dwords into words
; might need to mask off high word of MMX reg?
movq	[eax + ecx*8 - 8],mm3 ; store resulting vector

jnz	NextVect	;then loop back to do the next one.``````
Other times you will need to expand words/bytes to dwords/words to perform operations on them - in this case the second operand is zero to clear the upper part (for unsigned) or second operand is the data to expand and an arithmic shift is performed to sign extend the data:
``````; unsigned unpack of low words into dwords
pxor mm7, mm7 ; zero
punpcklwd mm0, mm7

; signed unpack of low words into signed dwords
punpcklwd mmY, mm0 ; first operand can be any MMX register
psard mmY, 16 ; fill upper word with sign``````
Of course an unsigned shift could be used for the unsigned unpack, but in a loop you'll want to use a spare register for zero if you have one to reduce the unpack to one instruction.
Posted on 2002-10-14 20:46:19 by bitRAKE