hi!

I miss an instruction in MMX, which i need in a program : PNOT ... a simple 64bit-NOT instructions.

I got the idea to solve the problem like this :



.data

notMap dq 0FFFFFFFFFFFFFFFFh

.code

movq mm1, qword ptr [notMap]
pxor mm0, mm1

but
1. i remembered that bitRAKE said, memory access is slow.
2. and I don't want to waste the memory for that

now i got the idea to do it like that :



pcmpeqd mm1, mm1
pxor mm0, mm1

as far as i remember the x86-cmp-instruction is a bit slow, i wonder if this also count for the mmx-pcmp's ... but since it seems to be the most complex MMX-instructions i think it is ... is it ? Also i don't like it, that it require a second register

Does anyone know a faster methode and/or with only one register to simulate PNOT ?!?

Cu, Jens
Posted on 2002-03-27 11:18:09 by Jens Duttke
PCMPEQB/PXOR is the fastest way I know of -- besides doing without it. :) There is the PANDN instruction that you can use in some instances. On the Athlon, 'PXOR mmreg, mem_FF' can be fast if the memory is in the cache - usually if you have an odd number of other memory accesses (cache line is 64 bytes).
Posted on 2002-03-27 11:46:14 by bitRAKE
hi!

Ok, thx bitRAKE

maybe we can modify the problem a bit ?!?
or should i open a new topic ? but since it is related to this topic I think i can attach it to this one :)

I have this problem now ...
I need to test if a byte-value (unsigned) is in a specified range like

.if (al >= 10) && (al <= 50)
mov al, 0FFh
.else
mov al, 0
.endif

the problem is ... i need to do that for a memory with 2000000 items ... sure ... i could do that with a small loop like a bytescanner ... but since mmx has the pcmp's which do more or less what i need, and handle 8 bytes at the same, i am sure it would be faster using it.

my problem is, if i try to do that i get only large code, which is for sure damn slow ... so i wonder if anyone know a fast way to do that with MMX.

the value itself is given in mm0
the lowest possible value is in mm1
the highest in mm2
all other registers are free for use
the result should be given in mm0

It's like The Svin's 'Logic' topic, but with a more practical use ... atleast for me :grin:

btw. have i already said, the compare need to be unsigned ? pcmpgtb seems to be signed.

Cu, Jens
Posted on 2002-03-27 12:03:47 by Jens Duttke
Use pcmpgtb
Posted on 2002-03-27 12:08:34 by The Svin
1. sub 10 with wrap-around (ie no saturate)
2. cmpgtb 40
3. done

Edit: Ignore that brain fart above :)

1. cmpgtb mm0, (09) ; min - 1
2. cmpgtb mm1, (50) ; max
3. pandn mm1, mm0 ; (mm0) & NOT (mm1) :)
Posted on 2002-03-27 12:14:54 by bitRAKE
hi!


Use pcmpgtb


yup ... but there are different problems :

1. pccmpgtb is signed, while i need a unsigned function (i could use one of the psubtracts ... but this would need memory access to save the 'substract-value')
2. there isn't a pcmpltb, so that need to be handled with a pnot, which is also not exist
3. pcmpgtb only works with the values above the lowst value ... so i would need to combine that with a pcmpeqb (with por)

and if i do all this stuff, it would be extremly long code ... which is (maybe) slower than a 'byte scanner'
so i wonder if there is any trick or something to do that on a faster way.

Cu, Jens
Posted on 2002-03-27 12:15:00 by Jens Duttke
hi!

That's the smallest code i can think of :



; Init.
pcmpeqd mm7, mm7 ; mm7 <- FFh

pxor mm6, mm6 ; mm6 <- 80h
psubb mm6, mm7
psllq mm6, 7

; Need to be done on each loop
psubb mm0, mm6 ; unsigned to signed
psubb mm1, mm6
psubb mm2, mm6

movq mm3, mm0
movq mm4, mm0

pcmpgtb mm4, mm1
pcmpeqb mm3, mm1
por mm4, mm3

pcmpgtb mm0, mm2
pandn mm0, mm4


But i think that's too much code for a simple range test ...
Isn't it ?

Cu, Jens
Posted on 2002-03-27 12:40:07 by Jens Duttke
movq mm3,mm0

cmpgtb mm0, mm1 ; mm1 is (min-1)
cmpgtb mm3, mm2 ; max
pandn mm3, mm0 ; (mm0) & NOT (mm3)
Posted on 2002-03-27 12:45:40 by bitRAKE
You are it's signed.
But here:
there isn't a pcmpltb, so that need to be handled with a pnot


No, you don't need it.
a <= x && x =>b
.if (al >= 10) && (al <= 50)
.data
a dq 090909090909h ;10-1
b dq 333333333333h ;50+1
.code

movq mm1,b
movq mm0,eightbytes

;---------------------------
pcmpgtb mm1,mm0 ;
pcmpgtb mm0,b ;generate FF if conditions met
pand mm1,mm0 ;
;----------------------------

movq eightbytes,mm1

bytes < 51 && > 9 = 0FFh rests = 00
Posted on 2002-03-27 12:47:21 by The Svin
hi!


movq mm3,mm0

cmpgtb mm0, mm1 ; mm1 is (min-1)
cmpgtb mm3, mm2 ; max
pandn mm3, mm0 ; (mm0) & NOT (mm3)


Test this for example with these values :

mm0 = E0h
mm1 = 05h
mm2 = EEh

you don't need to test it ... i've already done it :)
and the result is 0 ... but it should be FF ... the problem is ... pcmpgtw is signed ... while my values are unsigned.

Cu, Jens
Posted on 2002-03-27 12:49:01 by Jens Duttke
You said: 9 < x < 51
...so I posted a solution for that. :)
Posted on 2002-03-27 12:54:44 by bitRAKE
hi!


.data
a dq 090909090909h ;10-1
b dq 333333333333h ;50+1
.code

movq mm1,b
movq mm0,eightbytes

;---------------------------
pcmpgtb mm1,mm0 ;
pcmpgtb mm0,b ;generate FF if conditions met
pand mm1,mm0 ;
;----------------------------

movq eightbytes,mm1

I wonder where in your code, you use and what eightbytes is.

If eightbytes is the source and you don't need , it will not work.

Cu, Jens
Posted on 2002-03-27 12:54:58 by Jens Duttke
hi!


You said: 9 < x < 51
...so I posted a solution for that. :)

That was a example ... the min/max-values are given by the user, in the range of 0 to 255. :)

Cu, Jens
Posted on 2002-03-27 12:56:30 by Jens Duttke
; Need to be done on each loop

psubb mm0, mm6 ; unsigned to signed
psubb mm1, mm6
psubb mm2, mm6

movq mm3,mm0
cmpgtb mm0, mm1 ; mm1 is (min-1)
cmpgtb mm3, mm2 ; max
pandn mm3, mm0 ; (mm0) & NOT (mm3)
Posted on 2002-03-27 13:01:44 by bitRAKE
I wonder where in your code, you use and what eightbytes is.


Typo :)
:
.data
a dq 090909090909h ;10-1
b dq 333333333333h ;50+1
.code

movq mm1,b
movq mm0,eightbytes

;---------------------------
pcmpgtb mm1,mm0 ;
pcmpgtb mm0,a ;generate FF if conditions met
pand mm1,mm0 ;
;----------------------------

movq eightbytes,mm1

About eightbytes both bitRake and me as you probably understood thought that limits was predifiend, and gave you exaples about the limits.
I also thought that you need to set bytes in array to FF if conditions met and to 00 if not. eightbytes - abstruct name for
chunk of that array you need to fill with FF and 00 under above rules (if byte < 51 and > 9 byte := FF else byte := 00).
Posted on 2002-03-27 13:07:47 by The Svin
hi!


; Need to be done on each loop

psubb mm0, mm6 ; unsigned to signed
psubb mm1, mm6
psubb mm2, mm6

movq mm3,mm0
cmpgtb mm0, mm1 ; mm1 is (min-1)
cmpgtb mm3, mm2 ; max
pandn mm3, mm0 ; (mm0) & NOT (mm3)
In your code is one problem ... if min = 0 your code will fail.
And if you correct it, it will be the same code as mine (that's why i do this pcmpeqb, so that i don't need to do min - 1 which will not work if min = 0).

You see ... it isn't that easy ? :grin:

Cu, Jens
Posted on 2002-03-27 13:12:11 by Jens Duttke

You see ... it isn't that easy ? :grin:
Darn, I know there is an easy way to handle this - I feel it in my bones. Let me think through lunch on it - see if a meal will help my brain create some solution.
Posted on 2002-03-27 13:47:14 by bitRAKE
hi!

Ok, got it :


; Init.
pcmpeqd mm7, mm7 ; mm7 <- FFh

pxor mm6, mm6 ; mm6 <- 80h
psubb mm6, mm7
psllq mm6, 7

psubb mm2, mm1 ; max - min
psubb mm2, mm6 ; unsigned to signed

; Need to be done on each loop
psubb mm0, mm1 ; val - min
psubb mm0, mm6 ; unsigned to signed

pcmpgtb mm0, mm2
pxor mm0, mm7
This seems to work and is much smaller/faster ... maybe it's possible to optimize that even more ? :)

Cu, Jens
Posted on 2002-03-27 13:50:04 by Jens Duttke
min<=value<=max

movq mm3,mm2 ; value
pminub mm2,mm1 ; max
pmaxub mm3,mm0 ; min
pcmpeqb mm2,mm3 ; result
Posted on 2002-03-27 14:29:00 by Nexo
; Need to be done on each loop

psubb mm0, mm1 ; val - min
psubb mm0, mm6 ; unsigned to signed
Of course, these can be combined into one psubb.
Posted on 2002-03-27 14:29:16 by bitRAKE