hi all,

was not very good on mathematics at school... :)
and i'm not strong in assembler signed operations... :(

need a litle help

i have to do the signed sum of two numbers, the first could be negative or positive, the second is positive...

is there a better way than compare values and using "cmp", "neg" intruction, jumps .... ?

thanks B7
Posted on 2003-08-23 15:50:51 by Bit7
Originally posted by Bit7
is there a better way than compare values and using "cmp", "neg" intruction, jumps .... ?

Thomas
Posted on 2003-08-23 15:59:49 by Thomas
Hi thoma

what happens if we have a number like:

0111 1111 + 0100 0000 = 1011 1111 <---- negative???

peace
Posted on 2003-08-23 16:04:55 by mistronr1

Hi thoma

what happens if we have a number like:

0111 1111 + 0100 0000 = 1011 1111 <---- negative???

peace

Since a signed byte range is from -128 to +127, your calculation causes an overflow (127 + 64 > 127). The same thing happens when you add for example 255 and 255 with unsigned arithmetic. In both cases the result is invalid, although the flags will be set so that you can detect this and take the right action.

Thomas
P.S. signed numbers addiction and addition are two totally different things :grin:...
Posted on 2003-08-23 18:00:22 by Thomas
Im having a hard time getting just what Bit7 wants, but has it been made clear that you can use the .if/.endif to comare for signed numbers.

``````
mov eax, PositiveOrNegative
mov edx, PositiveOnly
.if( SDWORD PTR eax > = 0 )
.else
.endif``````

Maybe im way off base here.. i dunno..
:alright:
NaN
Posted on 2003-08-23 19:31:09 by NaN
I would suggest the OR instruction to test if the first number is negative or positive. If negative, the sign flag will be set. Under such condition, adding a negative number with a positive number can never cause an overflow. The sign of the result will depend on which one had the greatest absolute value.

If both are positive, you then have to test the result for an overflow which would set the sign flag and indicate an invalid result.

The following example uses 8-bit registers but you can modify it for any size register.
``````
mov   al,firstnum
or    al,al
jns   @F          ;jump if first number is positive
jmp   finish
@@:
js    OVERFLOW    ;go handle the overflow error
finish:``````

Raymond
Posted on 2003-08-23 21:56:50 by Raymond
thanks all, and sorry for my bad explaination.

Yes, my problem was the addition of a positive or negative with a positive number.

So seems i have to introduce comparisons and jumps... i would like to avoid this becouse i have a grat routine with about 16 of there operations, and i think that many "cmp/jump" will make my routine run too slow.....

Posted on 2003-08-24 00:09:42 by Bit7
Bit7, maybe you're just getting confused by the fact that the same add and sub instructions are used for both signed and unsigned. In your example

0111 1111 + 0100 0000 = 1011 1111 <---- negative???

The answer is both negatinve and positive, its depends on what instructions you use with it from then on. If you use eg mul or jb then the number is treated as positive, whereas using idiv or jg would treat the number as negative. Using add and sub it won't matter.

In floating point all numbers are signed, you can't get unsigned numbers.
Posted on 2003-08-24 06:48:19 by E�in
I think Eoin is correct. It does not matter if the number is positive or not, but what that really matter is whether you are adding unsigned numbers to signed numbers or something like that.

If both are sign, no problem, just use plain add. If one is unsigned and the other is unsigned, then it makes things hard.
Posted on 2003-08-24 07:02:21 by roticv
Bit7, theres an important bit you don't mention. When you say the second number is positive do you mean positive in the signed range 0-127 or in the unsigned range 0-255.

If its in the signed positive range then don't worry about checking signs just add and sub watching out for overflows. If its in the unsigned range then it only makes sense to treat the answer as an unsigned number. Eg -10 + 200 = 190 if you're taking unsigned numbers, or -66 if you're taking signed.

The lesson here is don't mixed signed and unsigned numbers too much. Positive and negative are ok to mix, just keep them within their signed ranges.
Posted on 2003-08-24 07:37:56 by E�in
roticv, eion, thanks again,

i think i have understand that add and sub work just with number, don't care about the sign, so i've to check an mange it.

Since i've discovered i need just another operation, where both the numbers could be neg or pos... i decide to go for floating to go in an easier way.

``````

fild	[y]
fiadd	[cr]                 can be pos/or negative
fidiv	[i256]
fiadd	[RTable2]       can be pos/or negative
fistp	[offTable]
mov	al, [byte ptr offTable]
mov	[byte ptr edi+2], al

``````

so i let the FPU do the homeworks for me :) but i think that using FPU make my routine slower... is it ?
Posted on 2003-08-24 08:49:10 by Bit7
It will make it very much slower, specially that division instruction.

If you are actually dividing by 256, you could then use the very fast SHR instruction instead of the DIV with the CPU.

Raymond
Posted on 2003-08-24 09:12:04 by Raymond
All this guess work. It would be alot more efficient to help you if we know roughtly what your algorithm was to be.. (I can interperent from you floating point listing), but im not sure where the end result is....

:NaN:
Posted on 2003-08-24 10:22:28 by NaN
it's jus a rotine i'm converting from C
The routine perform a YUV 4:1:1 image format to RGB, 24 bit per pixel.

since you're asking, this is the main cycle code

``````
while (wly--) {
for (i=0; i<lx; i+=4) {
cr=*crp++;
cr-=128;
cb=*cbp++;
cb-=128;
cg=cr;
cg+=cb;
cr*=409;
cg*=-617;
cb*=517;
cg+=cr;
cg+=cb;
y=MulTable[*src++];
*rp=Table[(y+cr)>>8];
rp+=rinc;
*gp=Table[(y+cg)>>8];
gp+=ginc;
*bp=Table[(y+cb)>>8];
bp+=binc;
y=MulTable[*src++];
*rp=Table[(y+cr)>>8];
rp+=rinc;
*gp=Table[(y+cg)>>8];
gp+=ginc;
*bp=Table[(y+cb)>>8];
bp+=binc;
y=MulTable[*src++];
*rp=Table[(y+cr)>>8];
rp+=rinc;
*gp=Table[(y+cg)>>8];
gp+=ginc;
*bp=Table[(y+cb)>>8];
bp+=binc;
y=MulTable[*src++];
*rp=Table[(y+cr)>>8];
rp+=rinc;
*gp=Table[(y+cg)>>8];
gp+=ginc;
*bp=Table[(y+cb)>>8];
bp+=binc;
}
}
}

``````

i've just converted, seems work... but i have to discover if is faster or slower heheheh

B7
Posted on 2003-08-24 11:00:29 by Bit7

If you are actually dividing by 256, you could then use the very fast SHR instruction instead of the DIV with the CPU.
Of course, SAR for signed numbers - the FPU only works with signed numbers.
Posted on 2003-08-24 11:12:01 by bitRAKE
A good rule to remember when using the:

sub
cmp

instructions is to use these jumps when dealing with unsigned numbers only:

JB, JBE, JA, JAE these test the Carry flag

and use these jumps when dealing with signed numbers only:

JL, JLE, JG, JGE these test the Sign & Overflow flags

farrier
Posted on 2003-08-24 14:05:55 by farrier
The nice thing about two's complement arithmetic is that adding (and subtracting) numbers is exactly the same for signed and unsigned. That's why, unlike multiply and divide, there aren't separate signed/unsigned versions of ADD and SUB. As always, arithmetic overflow will invalid your results (unless you want the "wraparound" feature.)
Posted on 2003-08-25 16:29:43 by tenkey
thanks all again, now i've understand what i was missing :

for some stupid reason (i've some great black holes about the knowledge of signed numbers), i tought that

11111111 = -128
10000001 = -1
:stupid:

11111111 = -1
10000000 = -128

so i've also soon discovered why the Intel processors have the imul or idiv and not iadd or isub :)
It worth some moderator please move my "posted" count to 0 or better -1 :)

Ok, now i've got it hardly and i hope i'll remember it well.
So on Sunday, when guys of my town go to the beach, i tought to change my YUV4:1:1 to RGB routine, avoiding the slow floating instructions.
This is the last release :)

``````
;---------------------------------------------------------------------------
proc		ImageToRGB uses ebx edx esi edi, src:dword, dst:dword, dstpitch:dword, lx:dword, ly:dword

;/*      src : image in CM_YUV411P format
;/*      dst : destination buffer
;/* dstpitch : dest. line pitch, pass zero to get the default
;/*    lx,ly : image dimensions in pixels
;/*  wy, wly : sub-image window to process (0,0 for all)

;B = 1.164(Y - 16) + 2.018(U - 128)
;G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128)
;R = 1.164(Y - 16) + 1.596(V - 128)

dataseg

RoundTable	db	300 dup(0), 256 dup(0), 300 dup(255)
MulTable	dd	256 dup(0)
Inited		dd	0

codeseg

LOCAL   i, y, RTable2: dword
LOCAL	cr, cg, cb : dword

cmp	[Inited],0
jne	@@st03

; Round-Table initialization
or	[Inited],1
lea 	edi,[RoundTable]
xor	ecx,ecx
@@st00:		mov	[byte ptr edi],cl
inc	ecx
inc	edi
cmp	ecx,256
jl	@@st00

lea 	edi,[MulTable]
xor	ecx,ecx
@@st02:		mov	eax,ecx
sub	eax,16
imul	eax,298
mov	[dword ptr edi],eax
inc	ecx
cmp	ecx,256
jl	@@st02

; preparo i puntatore croma red e croma blu

@@st03:		push	offset RoundTable + 300
pop	[RTable2]

mov	esi,[src]		; esi puntatore a luminanza

mov	eax,[lx]
imul	eax,[ly]
push	eax
mov	ebx,eax
add	ebx,[src]		; ebx croma red pointer

pop	eax
shr	eax,2
mov	edx,ebx
add	edx,eax			; edx croma blu pointer

mov	edi,[dst]		; edi RGB blue ptr

; ciclo di conversione

@@st04:		xor	ecx,ecx

@@st05:		movzx	eax,[byte ptr ebx]
mov	[cr],eax
sub	[cr],128

movzx	eax,[byte ptr edx]
mov	[cb],eax
sub	[cb],128

push	[cr]
pop	[cg]

mov	eax,[cb]

mov	eax,[cr]
imul	eax,409
mov	[cr],eax

mov	eax,[cg]
imul	eax,-617
mov	[cg],eax

mov	eax,[cb]
imul	eax,517
mov	[cb],eax

mov	eax,[cg]
mov	[cg],eax

movzx	eax,[byte ptr esi]
shl	eax,2
push	[dword ptr offset MulTable + eax]
pop	[y]

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi],al

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi+1],al

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi+2],al

inc	esi			; increment limunance ptr
add	edi,3			; increment RGB ptr

movzx	eax,[byte ptr esi]
shl	eax,2
push	[dword ptr offset MulTable + eax]
pop	[y]

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi],al

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi+1],al

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi+2],al

inc	esi			; increment limunance ptr
add	edi,3			; increment RGB ptr

movzx	eax,[byte ptr esi]
shl	eax,2
push	[dword ptr offset MulTable + eax]
pop	[y]

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi],al

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi+1],al

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi+2],al

inc	esi			; increment limunance ptr
add	edi,3			; increment RGB ptr

movzx	eax,[byte ptr esi]
shl	eax,2
push	[dword ptr offset MulTable + eax]
pop	[y]

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi],al

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi+1],al

mov	eax,[y]
sar	eax,8

mov	al,[byte ptr eax]
mov	[byte ptr edi+2],al

inc	esi			; increment limunance ptr
inc	ebx			; increment of croma red ptr
inc	edx			; increment of croma blu ptr
add	edi,3			; increment RGB ptr

cmp	ecx,[lx]
jl	@@st05

dec	[ly]
cmp	[ly],0
jg	@@st04

ret

endp		ImageToRGB

``````

I will do some measurements, my purpose is to make it faster than the vc++ one, any suggestion/trick is appreciated.

Thanks all again, B7
Posted on 2003-08-26 16:45:18 by Bit7