hi there, the problem is the following: i need to invert a whole MMx register to get the inverted mask. the only way i saw to achieve this is by PXORing it with 1111...1111 (64x).

take for example the following routine that takes mm0 and mm2 and returns the greater words of both in mm0 (assumes mm4=1111...1111):

my approach to fill up mm4 with all "1"s is:

...

is there a faster way to invert a whole MMx register?

is there another way than xoring with all "1"s?

is there a faster way to fill a whole MMx register with all "1"s?

take for example the following routine that takes mm0 and mm2 and returns the greater words of both in mm0 (assumes mm4=1111...1111):

**FindMM0MM2max:**

**movq**mm3,mm0 ;mm3 = copy of mm0 (mm3 becomes mask)

**pcmpgtw**mm3,mm2 ;mm3 has 1111... where mm0 > mm2

**pand**mm0,mm3 ;keep greater values of mm0, rest zero

**pxor**mm3,mm4 ;mm3 has 1111... where mm2 > mm0

**pand**mm2,mm3 ;keep greater values of mm2, rest zero

**por**mm0,mm2 ;now or them together to get greater values of both.**RET**my approach to fill up mm4 with all "1"s is:

**mov**eax,0ffffffffh**movd**mm4,eax**movq**mm5,mm4**psllq**mm5,32**por**mm4,mm5...

is there a faster way to invert a whole MMx register?

is there another way than xoring with all "1"s?

is there a faster way to fill a whole MMx register with all "1"s?

.data

qwerty dq -1

.code

movq mm4, qwerty

Would this work? (i'm still a newbie, still learning)

qwerty dq -1

.code

movq mm4, qwerty

Would this work? (i'm still a newbie, still learning)

maybe it would but not for my problem as i'm writing inline assembler :/

thanks anyway, have a nice day,

thanks anyway, have a nice day,

This have to be fast enougth:

```
pcmpeqb mm1, mm1 ; mm1= -1
```

pxor mm0, mm1 ; invert mm0

This have to be fast enougth:

```
pcmpeqb mm1, mm1 ; mm1= -1
```

pxor mm0, mm1 ; invert mm0

```
FindMM0MM2max:
```

movq mm3,mm0 ;mm3 = copy of mm0 (mm3 becomes mask)

pcmpgtw mm3,mm2 ;mm3 has 1111... where mm0 > mm2

pand mm0,mm3 ;keep greater values of mm0, rest zero

pandn mm3,mm2 ;keep greater values of mm2, rest zero

por mm0,mm3 ;now or them together to get greater values of both.

RET

I prefer the general form:```
pmaxsw MACRO mreg1:REQ, mreg2:REQ, mregx:REQ
```

movq mregx, mreg1

pcmpgtw mregx, mreg2

pand mreg1, mregx

pandn mregx, mreg2

por mreg1, mregx

ENDM

wow, thank you both a lot!

what a nice 2-liner, masquer :) gotta remember this, really nice trick!

bitrake: ahh i see ;) argl

thank you a lot, you're right, the fastest way is always not to do it ;) this solution is really obvious, duh

must be coz i dont like pandn as it "destroys" the mask because it negates the DESTINATION register before anding - so so can only work with the inverse mask exactly one time - suxx :/

anyway, this is the best way :) thank you very much

what a nice 2-liner, masquer :) gotta remember this, really nice trick!

bitrake: ahh i see ;) argl

thank you a lot, you're right, the fastest way is always not to do it ;) this solution is really obvious, duh

must be coz i dont like pandn as it "destroys" the mask because it negates the DESTINATION register before anding - so so can only work with the inverse mask exactly one time - suxx :/

anyway, this is the best way :) thank you very much

This have to be fast enougth:

```
pcmpeqb mm1, mm1 ; mm1= -1
```

pxor mm0, mm1 ; invert mm0

ah, i forgot to ask:

is there a list of how many cycles these mmx instructions take?

would pcmpeqD be faster than pcmpeqB?

in general: are those dealing with big chunks (quadword) faster then the ones operating on bytes?

is there a list of how many cycles these mmx instructions take?

would pcmpeqD be faster than pcmpeqB?

in general: are those dealing with big chunks (quadword) faster then the ones operating on bytes?

I have forgot PANDN before, too. I think Intel have included it just for the type of use above.

Hi

If you have PIII or more, try PMAXSW MM0,MM2 too. It makes FindMM0MM2Max.

Bye

If you have PIII or more, try PMAXSW MM0,MM2 too. It makes FindMM0MM2Max.

Bye

AMD Athlon and Duron also have PMAXSW instruction.

ahh, i see... took me quite a time to find good info about pmaxsw but it actually works :) thanks a lot!