hi there, the problem is the following: i need to invert a whole MMx register to get the inverted mask. the only way i saw to achieve this is by PXORing it with 1111...1111 (64x).
take for example the following routine that takes mm0 and mm2 and returns the greater words of both in mm0 (assumes mm4=1111...1111):

FindMM0MM2max:
    movq mm3,mm0    ;mm3 = copy of mm0 (mm3 becomes mask)
    pcmpgtw mm3,mm2 ;mm3 has 1111... where mm0 > mm2
    pand mm0,mm3    ;keep greater values of mm0, rest zero
    pxor mm3,mm4    ;mm3 has 1111... where mm2 > mm0
    pand mm2,mm3    ;keep greater values of mm2, rest zero
    por mm0,mm2     ;now or them together to get greater values of both.
RET

my approach to fill up mm4 with all "1"s is:

mov eax,0ffffffffh
movd mm4,eax
movq mm5,mm4
psllq mm5,32
por mm4,mm5

...
is there a faster way to invert a whole MMx register?
is there another way than xoring with all "1"s?
is there a faster way to fill a whole MMx register with all "1"s?
Posted on 2002-11-11 05:38:25 by BugByter
.data
qwerty dq -1

.code
movq mm4, qwerty


Would this work? (i'm still a newbie, still learning)
Posted on 2002-11-11 05:52:09 by Delight
maybe it would but not for my problem as i'm writing inline assembler :/
thanks anyway, have a nice day,
Posted on 2002-11-11 06:06:12 by BugByter
This have to be fast enougth:
pcmpeqb mm1, mm1 ; mm1= -1

pxor mm0, mm1 ; invert mm0
Posted on 2002-11-11 09:28:58 by masquer

This have to be fast enougth:
pcmpeqb mm1, mm1 ; mm1= -1

pxor mm0, mm1 ; invert mm0
The fastest way is not to do it! :)
FindMM0MM2max:

movq mm3,mm0 ;mm3 = copy of mm0 (mm3 becomes mask)
pcmpgtw mm3,mm2 ;mm3 has 1111... where mm0 > mm2
pand mm0,mm3 ;keep greater values of mm0, rest zero
pandn mm3,mm2 ;keep greater values of mm2, rest zero
por mm0,mm3 ;now or them together to get greater values of both.
RET
I prefer the general form:
pmaxsw MACRO mreg1:REQ, mreg2:REQ, mregx:REQ

movq mregx, mreg1
pcmpgtw mregx, mreg2
pand mreg1, mregx
pandn mregx, mreg2
por mreg1, mregx
ENDM
Posted on 2002-11-11 10:49:16 by bitRAKE
wow, thank you both a lot!
what a nice 2-liner, masquer :) gotta remember this, really nice trick!

bitrake: ahh i see ;) argl
thank you a lot, you're right, the fastest way is always not to do it ;) this solution is really obvious, duh

must be coz i dont like pandn as it "destroys" the mask because it negates the DESTINATION register before anding - so so can only work with the inverse mask exactly one time - suxx :/
anyway, this is the best way :) thank you very much
Posted on 2002-11-11 13:22:40 by BugByter

This have to be fast enougth:
pcmpeqb mm1, mm1 ; mm1= -1

pxor mm0, mm1 ; invert mm0


ah, i forgot to ask:
is there a list of how many cycles these mmx instructions take?
would pcmpeqD be faster than pcmpeqB?
in general: are those dealing with big chunks (quadword) faster then the ones operating on bytes?
Posted on 2002-11-11 13:29:09 by BugByter

is there a list of how many cycles these mmx instructions take?
would pcmpeqD be faster than pcmpeqB?
in general: are those dealing with big chunks (quadword) faster then the ones operating on bytes?
Generally speaking, the MMX instructions all take the same amount of time. The execution speed is based on the number of MMX execution units within the processor: most processors can execute two instructions per cycle. Athlons can sometimes execute three instructions per clock. Don't use MOVD if you can design the algo to use MOVQ. Use pack/unpack rather than shifts as older pentiums have slower MMX shift. Test, test, test. :)

I have forgot PANDN before, too. I think Intel have included it just for the type of use above.
Posted on 2002-11-11 14:42:14 by bitRAKE
Hi

If you have PIII or more, try PMAXSW MM0,MM2 too. It makes FindMM0MM2Max.

Bye
Posted on 2002-11-15 05:55:15 by valy
AMD Athlon and Duron also have PMAXSW instruction.
Posted on 2002-11-15 06:55:44 by Tomasz Grysztar
ahh, i see... took me quite a time to find good info about pmaxsw but it actually works :) thanks a lot!
Posted on 2002-11-25 11:55:11 by BugByter