Some MMX operations can take a memory address as operand, e.g.

punpcklbw mm0, qword ptr

Some of these operations only take the lowest 32-bits of the memory data. Now, my question concerns memory alignment. Do the memory accesses have to be qword aligned or can they be dword aligned?

The reason I ask is because

punpcklbw mm0, dword ptr ; is illegal.
Posted on 2002-06-17 16:07:12 by cavello
IIRC, MOVD is the only MMX instruction that
works on DWORDs - use QWORD for the rest.
Posted on 2002-06-17 18:00:31 by bitRAKE
iirc MMX data can have any alignment, but should be aligned to 8byte
boundaries for speed issues.
Some(?) of the SSE (or was it SSE2 only?) instructions require
32byte aligned data or the instructions will cause exceptions.
Posted on 2002-06-17 18:25:22 by f0dder
bitRAKE, I think you are mistaken. Several MMX instructions use DWORD data pointed to by a mem32 operand. E.g. punpcklxx ops.

Now, if we examine the 20726 document from AMD, p.110, which describes punpcklwd, the diagram shows how DWORD data in memory need only be DWORD aligned. Yet, a qword ptr is used as operand. Why?

It says on p.109 of said document:
"The PUNPCKLWD instruction unpacks and interleaves two 16-bit values from the low 32 bits of the source operand (an MMX register or a 32-bit memory location) and two 16-bit values from the low 32 bits of the destination operand (an MMX register)."

And on p.110 it illustrates how the memory might be DWORD aligned (not implicitly stated, but implied by the diagram).

Imagine the byte series pointed to by ESI that starts on a qword aligned address:

1 2 3 4 5 6 7 8 (low -> high memory address)

movd mm0, ; 0000 4321
movd mm1, ; 0000 8765
punpcklbw mm0, mm1 ; 8473 6251

provides the same result as

movd mm0, ; 0000 4321
punpcklbw mm0, ; 8473 6251
;will this cause a misalignment penalty? How do I check?

The question remains: Does need to be qword aligned OR dword aligned?

Posted on 2002-06-17 18:29:00 by cavello
cavello, you are correct (see page 171 of doc 22007)
The width of the memory access performed by the load-execute forms of PUNPCKLBW, PUNPCKLWD, and PUNPCKLDQ is 32 bits (a DWORD), while the width of the memory access of the load-execute forms of PUNPCKHBW, PUNPCKHWD, and PUNPCKHDQ is 64 bits (a QWORD).

This means that the alignment requirements for memory operands of PUNPCKL* instructions (DWORD alignment) are less strict than the alignment requirements for memory operands of PUNPCKH* instructions (QWORD alignment). Code can take advantage of this in order to reduce the number of misaligned loads in a program. A second advantage of using PUNPCKL* instead of PUNPCKH* is that it helps avoid size mismatches during load-to-store forwarding. Store data from either a DWORD store or the lower DWORD of a QWORD store can be bypassed inside the load/store buffer to PUNPCKL*, but only store data from a QWORD store can be bypassed to PUNPCKH*.
I always align my data by the data size or greater, so this is not a concern for me. But it is good to keep in mind for other purposes.
Posted on 2002-06-17 18:43:01 by bitRAKE
Just ran the code snippet above through AMD's Codeanalyst, and it could NOT detect a dword misalignment on !

This MEANS that MMX mem32 operands do NOT need to be QWORD aligned!

Can anyone confirm this? My code depends on this being true.

Thank you. Please state methods used to test this assumption.
Posted on 2002-06-17 18:43:28 by cavello
We posted at the same time - the above post is your answer for AMD CPUs. Intel flatly states 8-byte alignment on MMX in the P4 Optimization Manual.
Posted on 2002-06-17 18:48:59 by bitRAKE
Thanks for that info about the Intel MMX implementation, bitRAKE. It may mean I have to reconsider my code.

Can anyone confirm this with some simulation/practical tests? I ask this because AMD doesn't actually state the alignment requirement, and I found it only through testing. Perhaps Intel's docs aren't quite accurate. I simply can't believe the P4 would have such a strict requirement given that only DWORDs are used in some MMX ops.

Thanks for the time.
Posted on 2002-06-17 19:04:52 by cavello
cavello, the AMD quote above means:
- PUNPCKL* must be DWORD aligned.
- PUNPCKH* must be QWORD aligned.

They do say it. ?No? ;)
Posted on 2002-06-17 19:12:50 by bitRAKE
I'm sorry! My brain just skipped that quote completely! Way too much coffee! Thanks for the quote.
Posted on 2002-06-17 19:24:47 by cavello