:shock:

Ok, first time I take a look at this, seem somewhat easy, tought Im having a little problem.

I whant to do a large integer addition, and for be more easy, the numbers will be stored in quadwords not need to take a look at "corner cases" (tought I still have some problems in how I will convert the large number to binary in representation of quadwords, but I think then I must suply the numbers like hex instead of decimal for do this translation more easy.

Ok, returning, I have been trying to understand the round modes (I dont see much diference between Wraparaound and Signed Saturation, perhaps Im not using the correct numbers for show the diference). I have tested the variations to padd/u/s/b and they execute ok in my computer.

The problem come when I use paddq, I dont know if the assembling is uncorrect... because olly dbg is unable to show the opcode...

0FD4C1                  paddq mm0,mm1
0FD4CA                  paddq mm1,mm2


Tought the other padd... work ok in my computer and for watch the result inside ollydbg, paddq causes a exception....

It is posible that?

I paste the code Im using for tests....


segment .data class=data
x dd 0xffFFffFF, 0xffFFffFF
uno dd 0x010101, 0x01010101
orden db 0x11, 0x22, 0x33, 0x44, 0xaa, 0xBB, 0xcc, 0xDD
numero dd 0xaaBBccDD, 0x11223344 ; store in reverse
propagar db 0xff, 0,0,0,  0,0,0,0
n1 db 1,0,0,0,  0,0,0,0

segment .code class=code
%define LINKERUSED GOLINK
MakeEntry
prefetchnta
movq mm0,
movq mm1,
movq mm2, mm0
movq mm3, mm1
movq mm4, mm0
movq mm5, mm1
paddb mm0, mm1
paddsb mm2, mm3
paddusb mm4, mm5
nop
movq mm0,
movq mm1,
movq mm2, mm0
movq mm3, mm1
movq mm4, mm0
movq mm5, mm1
paddb mm0, mm1
paddsb mm2, mm3
paddusb mm4, mm5
nop
nop
movq mm6,
movq mm7,
movq mm0,
movq mm1,
movq mm2, mm0
movq mm3, mm1
movq mm4, mm0
movq mm5, mm1
paddb mm0, mm1
paddsb mm2, mm3
paddusb mm4, mm5
nop
nop
pxor mm7, mm7
movq mm0,
movd mm2,
paddq mm0,mm1 ; this instruction dosent work... or is bad assembled
paddq mm1,mm2 ; this instruction dosent work... or is bad assembled
Posted on 2005-06-18 15:22:02 by rea
Now I get it a little more...  but more confused... lol, paddq isnt listed in mmx, tought it use mmxN registers... also I have readed that paddq was introduced with SSE, but like I see in arithmetic instructions is not listed ( ftp://download.intel.com/design/Pentium4/manuals/25366515.pdf section 10.4.1.2) but also in the section about mmx is not listed too... (section 9.4 of the same document).

In ftp://download.intel.com/design/Pentium4/manuals/25366715.pdf tought listed is no clarification if is mmx or SSE... perhaps a miss in the documentation or I was more confused about this than what I tought.

I gues my machine is somewhat old, it support mmx and 3dnow, but if paddq is SSE, then I must figure out how to do large addition with padd/u/s/(b/w/d)....



Also yes, aparently olly dbg dosent recognogize this opcode like a mnemonic.
Posted on 2005-06-18 18:10:56 by rea
SSE are floating point SIMD operations
SSE2 are integer SIMD operations

so if the instruction you mention (sorry, but i dont have the time to confirm that) is integer operation, then it's SSE2, not SSE. SSE2 came with P4, I think.
Posted on 2005-06-18 19:03:58 by ti_mo_n
All this about wich is for binary and floating point binary confuses me, tought, aparently I havent finded in  the first reference manual a reference to paddq in mmx, sse, sse2 .. havent checked sse3 because I no see sense...


Anyway, returning to the thing about how to add this things with only mmx and the "not presence" of paddq, I have somewhat proved that in any base, when there is a fixed limit to wrap around, that is a fixed wide, like a dword or a byte, when is added two numbers, this addition (without the carry because is fixed wide) is greater than  any of the two numbers, then there no exist carry, but if the result is less than any of the numbers to add, then there exist carry.

Perhaps with a example.

Supose the LIMIT or the wide is one digit more than that will be a carry ignored or wrap aroun the digit.

3+3 = 6 ; 6 > 3 or 6 > 3 then no carry
3+6 = 9 ; 9 > 3 or 9 > 6 then no carry
5+5 = 0 ; 0 > 5 or 0 > 5 false then carry


movq mm0,
movq mm1,
movq mm2, mm0
paddd mm0, mm1
pcmpgtd mm2, mm0
pcmpeqd mm2, mm0


Tought the order of the last two instruction was more a trial and error and hope that they work based on the anterior assumption.

I see that because cmp operations cause a MASK of the wide of the operands instead of 1 or 0 :S, then Im trying to get how I will convert that 0xFFffFFff in 0x1 and 0x0 keep like 0x0... that is the problem that Im solving now...



Edit... I see anding with 1 :P.. lol.

Edit 2, also aparently it will not work if one of the operands is 0... mean that instead of a > c "or" a > b it should be replaced the or with and if and only if one of b or c is zero.... :)......
Posted on 2005-06-18 19:49:24 by rea