hi there, i'm getting really mad about this one:
i have an mmx register with 4 word values. i want to parallel integer divide all 4 of them words by an arbitrary integer (say, up to 49, for sake of any optimisations). the divisor always stays the same.
how do i accomplish this? it seems to me that there is no integer division among the mmx commands (easy divisions by a power of 2 by shifting right are absolutely clear to me).

or is the only thing i can do changing my integers to floating point by using sth like 3dnow:
PF2IW
then divide, truncate to int again by using
PI2FW
?

but then how would i do the division? i read about PFRCP that calcs 1/x which then could be multiplied with the number to get number/x. but not for words!?
additionally, on which cpus would 3dnow instructions run? only athlons and durons (would be kinda crappy :/ )?

-----
so how can i divide all 4 words in my MMx register by the same integer number which is not a power of 2?

thank you so much for your support, youre really helping me out...
Posted on 2002-11-25 12:24:15 by BugByter
If you don't require great precision:
_DATA SEGMENT

Variable_WORD WORD 49
MMX_DATA WORD 52, 51, 50, 49
_DATA ENDS

movsx ecx, WORD PTR Variable_WORD
movq mm1, QWORD PTR MMX_DATA

mov eax, 0FFFFh ; ~ 1.0000 fixed point
cdq
div ecx
movd mm0, eax
pshufw mm0, mm0, 0
...
...
pmulhw mm1, mm0
Posted on 2002-11-25 13:45:02 by bitRAKE
hm, if i understand this right, u first do 1/divisor one time and then move the result to the 4 words of mm0.
can you tell how accurate the result of the division will be? or, to say it easy, i just need integer results... what is the minimal divisor i can use using this technique?
how do i then get integer values from the floating point results? will normal packing in the last step do the trick? i use it coz saturating is just perfect for RGB color values and i have to write back dwordwise anyway.
does it not matter that the numbers in mm1 are not floating point but normal numbers when you multiply mm1(normal) by mm0(fp), does the cpu check this for itself?

thanks a lot for your great help!
Posted on 2002-11-30 16:25:17 by BugByter
pmulhw takes care of making the four words into signed integers - no further conversion is needed. This is all integer math - no floating point. I'm not certain of the accuracy - you'll have to test it in your situations. One thing I would like to note is that there is a bias towards zero due to using 0FFFFh rather than 10000h. 10000h can't be used because Variable_WORD might be 1 and then the result of the division wouldn't fit in 16-bits. Actually, looking at it again - I see that Variable_WORD=1 still doesn't work - would have to change pmulhw to pmulhuw. I was just trying to get the divide down to one instruction within the loop, but you could do better with a multiply and a shift. Feel free to post your inner loop code...
Posted on 2002-11-30 19:52:57 by bitRAKE