I'm writing a restoring division algorithm (for S&G mostly) and have run into a problem. First, here's my code:



DIV1 proc Dividend:DWORD, Divisor:DWORD

; Remainder register - ebx:eax
; Divisor register - edx

mov eax, Dividend ; Remainder (right)
xor ebx, ebx ; Remainder (left)
mov ecx, 32 ; Counter
mov edx, Divisor ; Divisor

shld ebx, eax, 1 ; Shift the Remainder register left 1 bit

DIV1_STEP1:
sub ebx, edx ; Subtract the Divisor register from the left half of the Remainder
js DIV1_STEP2B ; Test Remainder

; If the Remainder is positive (the subtraction was a success)
shld ebx, eax, 1 ; Shift the Remainder register left 1 bit
or al, 01h ; Set the rightmost bit to 1
jmp DIV1_STEP3

DIV1_STEP2B:
; If the Remainder is negative (the subtraction failed)
add ebx, edx ; Restore the original value to the left half of the Remainder
shld ebx, eax, 1 ; Shift the Remainder register left 1 bit

DIV1_STEP3:
loop DIV1_STEP1 ; Repeat 32 times

shr ebx, 1 ; Shift the right half of the Remainder right 1 bit

ret
DIV1 endp


My problem is that the 15th line (js DIV1_STEP2B) always jumps to step 2B - maybe I'm using js wrong... I don't know. As a result, the Dividend is always returned as the answer. Obviously I'm don't something wrong, but I can't figure out what it is. Any help would be appreciated.

Spara
Posted on 2004-08-01 19:30:02 by SowWn
Some important things to note:

- I don't know what this code is suppose to do.
- SHLD only shifts the first (destination) operand.
- if the algo operates on unsigned data use JC, for signed data use JL.

I believe you want to do this:

shld ebx, eax, 1
shl eax, 1

...rather than just the first instruction, but it is more efficient to use:

add eax, eax
adc ebx, ebx

I thought SHLD worked that way when I first tried to use it, now I don't use it at all.

Okay, I see - you are trying to divide by scaled sutraction.
Sorry, it is late and I was pulling a blank.
Posted on 2004-08-01 23:12:05 by bitRAKE
Yep, do what he said. And you can remove a few instructions by doing this:
DIV1 proc Dividend:DWORD, Divisor:DWORD

mov eax,Dividend
xor ebx,ebx
push 32
pop ecx
mov edx,Divisor
beginning:
add eax,eax
adc ebx,ebx
sub ebx,edx
jae one
add ebx,edx
one:
sbb eax,-1
loop beginning
ret

It doesn't do anything you can't do with the DIV instruction, but increasing the divisor size should be trivial.
Posted on 2004-08-02 12:14:10 by Sephiroth3
Ok, I can see how

add eax, eax
adc ebx, ebx

works in place of

shld ebx, eax, 1
shl eax, 1

but is there a simmilar trick for shrd?

@Sephiroth3: Thanks for the hints... I probably would have done that eventualy, but right now I'm trying to follow an algorithm in a book as close as possible. I don't quite understand the "sbb eax, -1" statement, could you explain that?

Spara

EDIT: Replacing the SHLD made the function work perfectly. Thanks.
Posted on 2004-08-02 15:13:55 by SowWn

Ok, I can see how

add eax, eax
adc ebx, ebx

works in place of

shld ebx, eax, 1
shl eax, 1

but is there a simmilar trick for shrd?
shr eax, 1
rcr ebx, 1

...should do the trick. SHRD/SHLD is only really useful for multi-bit shifts where MMX can't be used. Actually, a branchless MMX/SSE version of this routine would be an interesting challenge (and maybe faster than other methods for numbers in the 64/128-bit range?).
Posted on 2004-08-02 17:25:05 by bitRAKE
I'm not so sure if parallelism is possible here, since you need the remainder of the high result to computer the low result...

Spara: The sbb eax,-1 will increase eax if the carry flag is clear.
Posted on 2004-08-03 08:52:47 by Sephiroth3

I'm not so sure if parallelism is possible here, since you need the remainder of the high result to computer the low result...
Very true, but I meant to just code a branchless version of the algo - not to parallize. Un-predicted branches are killer on a big pipe and AMD has cheap shift.
Posted on 2004-08-03 18:56:12 by bitRAKE
Pardon my ignorance, but how could you write a branceless division algorithm? As far as I know, there are only a few division algorithms: Restoring/Nonrestoring, Booth's, and SRT. All of these use some sort of comparison and decision inside the loop. Can you explain how you would write a branchless version of any of the above? I'd be interested in writing it if I knew what you were talking about.

Spara
Posted on 2004-08-03 20:30:17 by SowWn
Well, I said it'd be a challenge. ;) MMX allows branchless choices using masks - I was thinking something along those lines. I have other work ATM or I'd code it up. So many ideas - so little time!
Posted on 2004-08-03 21:58:59 by bitRAKE