:shock:

Ok, first time I take a look at this, seem somewhat easy, tought Im having a little problem.

I whant to do a large integer addition, and for be more easy, the numbers will be stored in quadwords not need to take a look at "corner cases" (tought I still have some problems in how I will convert the large number to binary in representation of quadwords, but I think then I must suply the numbers like hex instead of decimal for do this translation more easy.

Ok, returning, I have been trying to understand the round modes (I dont see much diference between

The problem come when I use paddq, I dont know if the assembling is uncorrect... because olly dbg is unable to show the opcode...

Tought the other padd... work ok in my computer and for watch the result inside ollydbg, paddq causes a exception....

It is posible that?

I paste the code Im using for tests....

Ok, first time I take a look at this, seem somewhat easy, tought Im having a little problem.

I whant to do a large integer addition, and for be more easy, the numbers will be stored in quadwords not need to take a look at "corner cases" (tought I still have some problems in how I will convert the large number to binary in representation of quadwords, but I think then I must suply the numbers like hex instead of decimal for do this translation more easy.

Ok, returning, I have been trying to understand the round modes (I dont see much diference between

*Wraparaound*and*Signed Saturation*, perhaps Im not using the correct numbers for show the diference). I have tested the variations to padd/u/s/b and they execute ok in my computer.The problem come when I use paddq, I dont know if the assembling is uncorrect... because olly dbg is unable to show the opcode...

0FD4C1 paddq mm0,mm1

0FD4CA paddq mm1,mm2

Tought the other padd... work ok in my computer and for watch the result inside ollydbg, paddq causes a exception....

It is posible that?

I paste the code Im using for tests....

segment .data class=data

x dd 0xffFFffFF, 0xffFFffFF

uno dd 0x010101, 0x01010101

orden db 0x11, 0x22, 0x33, 0x44, 0xaa, 0xBB, 0xcc, 0xDD

numero dd 0xaaBBccDD, 0x11223344 ; store in reverse

propagar db 0xff, 0,0,0, 0,0,0,0

n1 db 1,0,0,0, 0,0,0,0

segment .code class=code

%define LINKERUSED GOLINK

MakeEntry

prefetchnta

movq mm0,

movq mm1,

movq mm2, mm0

movq mm3, mm1

movq mm4, mm0

movq mm5, mm1

paddb mm0, mm1

paddsb mm2, mm3

paddusb mm4, mm5

nop

movq mm0,

movq mm1,

movq mm2, mm0

movq mm3, mm1

movq mm4, mm0

movq mm5, mm1

paddb mm0, mm1

paddsb mm2, mm3

paddusb mm4, mm5

nop

nop

movq mm6,

movq mm7,

movq mm0,

movq mm1,

movq mm2, mm0

movq mm3, mm1

movq mm4, mm0

movq mm5, mm1

paddb mm0, mm1

paddsb mm2, mm3

paddusb mm4, mm5

nop

nop

pxor mm7, mm7

movq mm0,

movd mm2,

paddq mm0,mm1 ; this instruction dosent work... or is bad assembled

paddq mm1,mm2 ; this instruction dosent work... or is bad assembled

Now I get it a little more... but more confused... lol, paddq isnt listed in mmx, tought it use mmxN registers... also I have readed that paddq was introduced with SSE, but like I see in arithmetic instructions is not listed ( ftp://download.intel.com/design/Pentium4/manuals/25366515.pdf section 10.4.1.2) but also in the section about mmx is not listed too... (section 9.4 of the same document).

In ftp://download.intel.com/design/Pentium4/manuals/25366715.pdf tought listed is no clarification if is mmx or SSE... perhaps a miss in the documentation or I was more confused about this than what I tought.

I gues my machine is somewhat old, it support mmx and 3dnow, but if paddq is SSE, then I must figure out how to do large addition with padd/u/s/(b/w/d)....

Also yes, aparently olly dbg dosent recognogize this opcode like a mnemonic.

In ftp://download.intel.com/design/Pentium4/manuals/25366715.pdf tought listed is no clarification if is mmx or SSE... perhaps a miss in the documentation or I was more confused about this than what I tought.

I gues my machine is somewhat old, it support mmx and 3dnow, but if paddq is SSE, then I must figure out how to do large addition with padd/u/s/(b/w/d)....

Also yes, aparently olly dbg dosent recognogize this opcode like a mnemonic.

SSE are floating point SIMD operations

SSE2 are integer SIMD operations

so if the instruction you mention (sorry, but i dont have the time to confirm that) is integer operation, then it's SSE2, not SSE. SSE2 came with P4, I think.

SSE2 are integer SIMD operations

so if the instruction you mention (sorry, but i dont have the time to confirm that) is integer operation, then it's SSE2, not SSE. SSE2 came with P4, I think.

All this about wich is for binary and floating point binary confuses me, tought, aparently I havent finded in the first reference manual a reference to paddq in mmx, sse, sse2 .. havent checked sse3 because I no see sense...

Anyway, returning to the thing about how to add this things with only mmx and the "not presence" of paddq, I have somewhat proved that in any base, when there is a fixed limit to wrap around, that is a fixed wide, like a dword or a byte, when is added two numbers, this addition (without the carry because is fixed wide) is greater than any of the two numbers, then there no exist carry, but if the result is less than any of the numbers to add, then there exist carry.

Perhaps with a example.

Supose the LIMIT or the wide is one digit more than that will be a carry ignored or wrap aroun the digit.

3+3 = 6 ; 6 > 3 or 6 > 3 then no carry

3+6 = 9 ; 9 > 3 or 9 > 6 then no carry

5+5 = 0 ; 0 > 5 or 0 > 5 false then carry

Tought the order of the last two instruction was more a trial and error and hope that they work based on the anterior assumption.

I see that because cmp operations cause a MASK of the wide of the operands instead of 1 or 0 :S, then Im trying to get how I will convert that 0xFFffFFff in 0x1 and 0x0 keep like 0x0... that is the problem that Im solving now...

Edit... I see anding with 1 :P.. lol.

Edit 2, also aparently it will not work if one of the operands is 0... mean that instead of a > c "or" a > b it should be replaced the or with and if and only if one of b or c is zero.... :)......

Anyway, returning to the thing about how to add this things with only mmx and the "not presence" of paddq, I have somewhat proved that in any base, when there is a fixed limit to wrap around, that is a fixed wide, like a dword or a byte, when is added two numbers, this addition (without the carry because is fixed wide) is greater than any of the two numbers, then there no exist carry, but if the result is less than any of the numbers to add, then there exist carry.

Perhaps with a example.

Supose the LIMIT or the wide is one digit more than that will be a carry ignored or wrap aroun the digit.

3+3 = 6 ; 6 > 3 or 6 > 3 then no carry

3+6 = 9 ; 9 > 3 or 9 > 6 then no carry

5+5 = 0 ; 0 > 5 or 0 > 5 false then carry

movq mm0,

movq mm1,

movq mm2, mm0

paddd mm0, mm1

pcmpgtd mm2, mm0

pcmpeqd mm2, mm0

Tought the order of the last two instruction was more a trial and error and hope that they work based on the anterior assumption.

I see that because cmp operations cause a MASK of the wide of the operands instead of 1 or 0 :S, then Im trying to get how I will convert that 0xFFffFFff in 0x1 and 0x0 keep like 0x0... that is the problem that Im solving now...

Edit... I see anding with 1 :P.. lol.

Edit 2, also aparently it will not work if one of the operands is 0... mean that instead of a > c "or" a > b it should be replaced the or with and if and only if one of b or c is zero.... :)......