I'm writing a restoring division algorithm (for S&G mostly) and have run into a problem. First, here's my code:

``````
DIV1 proc Dividend:DWORD, Divisor:DWORD

; Remainder register - ebx:eax
; Divisor register   -     edx

mov       eax, Dividend       ; Remainder (right)
xor       ebx, ebx            ; Remainder (left)
mov       ecx, 32             ; Counter
mov       edx, Divisor        ; Divisor

shld      ebx, eax, 1         ; Shift the Remainder register left 1 bit

DIV1_STEP1:
sub       ebx, edx            ; Subtract the Divisor register from the left half of the Remainder
js        DIV1_STEP2B         ; Test Remainder

; If the Remainder is positive (the subtraction was a success)
shld      ebx, eax, 1         ; Shift the Remainder register left 1 bit
or        al, 01h             ; Set the rightmost bit to 1
jmp       DIV1_STEP3

DIV1_STEP2B:
; If the Remainder is negative (the subtraction failed)
add       ebx, edx            ; Restore the original value to the left half of the Remainder
shld      ebx, eax, 1         ; Shift the Remainder register left 1 bit

DIV1_STEP3:
loop      DIV1_STEP1          ; Repeat 32 times

shr       ebx, 1              ; Shift the right half of the Remainder right 1 bit

ret
DIV1 endp
``````

My problem is that the 15th line (js DIV1_STEP2B) always jumps to step 2B - maybe I'm using js wrong... I don't know. As a result, the Dividend is always returned as the answer. Obviously I'm don't something wrong, but I can't figure out what it is. Any help would be appreciated.

Spara
Posted on 2004-08-01 19:30:02 by SowWn
Some important things to note:

- I don't know what this code is suppose to do.
- SHLD only shifts the first (destination) operand.
- if the algo operates on unsigned data use JC, for signed data use JL.

I believe you want to do this:

shld ebx, eax, 1
shl eax, 1

...rather than just the first instruction, but it is more efficient to use:

I thought SHLD worked that way when I first tried to use it, now I don't use it at all.

Okay, I see - you are trying to divide by scaled sutraction.
Sorry, it is late and I was pulling a blank.
Posted on 2004-08-01 23:12:05 by bitRAKE
Yep, do what he said. And you can remove a few instructions by doing this:
``````DIV1 proc Dividend:DWORD, Divisor:DWORD
mov eax,Dividend
xor ebx,ebx
push 32
pop ecx
mov edx,Divisor
beginning:
sub ebx,edx
jae one
one:
sbb eax,-1
loop beginning
ret``````

It doesn't do anything you can't do with the DIV instruction, but increasing the divisor size should be trivial.
Posted on 2004-08-02 12:14:10 by Sephiroth3
Ok, I can see how

works in place of

shld ebx, eax, 1
shl eax, 1

but is there a simmilar trick for shrd?

@Sephiroth3: Thanks for the hints... I probably would have done that eventualy, but right now I'm trying to follow an algorithm in a book as close as possible. I don't quite understand the "sbb eax, -1" statement, could you explain that?

Spara

EDIT: Replacing the SHLD made the function work perfectly. Thanks.
Posted on 2004-08-02 15:13:55 by SowWn

Ok, I can see how

works in place of

shld ebx, eax, 1
shl eax, 1

but is there a simmilar trick for shrd?
shr eax, 1
rcr ebx, 1

...should do the trick. SHRD/SHLD is only really useful for multi-bit shifts where MMX can't be used. Actually, a branchless MMX/SSE version of this routine would be an interesting challenge (and maybe faster than other methods for numbers in the 64/128-bit range?).
Posted on 2004-08-02 17:25:05 by bitRAKE
I'm not so sure if parallelism is possible here, since you need the remainder of the high result to computer the low result...

Spara: The sbb eax,-1 will increase eax if the carry flag is clear.
Posted on 2004-08-03 08:52:47 by Sephiroth3

I'm not so sure if parallelism is possible here, since you need the remainder of the high result to computer the low result...
Very true, but I meant to just code a branchless version of the algo - not to parallize. Un-predicted branches are killer on a big pipe and AMD has cheap shift.
Posted on 2004-08-03 18:56:12 by bitRAKE
Pardon my ignorance, but how could you write a branceless division algorithm? As far as I know, there are only a few division algorithms: Restoring/Nonrestoring, Booth's, and SRT. All of these use some sort of comparison and decision inside the loop. Can you explain how you would write a branchless version of any of the above? I'd be interested in writing it if I knew what you were talking about.

Spara
Posted on 2004-08-03 20:30:17 by SowWn
Well, I said it'd be a challenge. ;) MMX allows branchless choices using masks - I was thinking something along those lines. I have other work ATM or I'd code it up. So many ideas - so little time!
Posted on 2004-08-03 21:58:59 by bitRAKE