Hi all,

First: THANK YOU all for an excellent forum! :)

Second:

I want to optimize a 64-bit integer division by a 32 bit integer.

The result is allowed to be a 64 bit rounded integer.

If possible show its rounded with flag set.

I've got the 64 bit stored as 2 Dword, LoOperand and HiOperand.

The result as integer 2 Dword as LoResult and HiResult.

Any trick from the gurus??

SgtPepper

First: THANK YOU all for an excellent forum! :)

Second:

I want to optimize a 64-bit integer division by a 32 bit integer.

The result is allowed to be a 64 bit rounded integer.

If possible show its rounded with flag set.

I've got the 64 bit stored as 2 Dword, LoOperand and HiOperand.

The result as integer 2 Dword as LoResult and HiResult.

Any trick from the gurus??

SgtPepper

The first trick is......

Post some code.

Post some code.

Assuming your divisor is 32 bit unsigned. Signed and/or 64bit divisor makes it much more complicated.

Of course this could all be done better with MMX.

```
[size=12];In <- edx:eax = 64 bit dividend
```

; <- ecx = 32bit divisor

push eax

mov eax, edx

xor edx, edx

div ecx

xchg eax, [esp]

div ecx

pop edx

;Out -> edx:eax = 64 bit quotient.[/size]

Of course this could all be done better with MMX.

Of course this could all be done better with MMX

MMX?

Did you mean FPU?

Of course there might be some several algos to do it with MMX, but I haven't seen one at least as fast as FPU.

The only other algo I've seen to do it used MMX. I haven't seen an FPU version although I imagine it would be relatively easy to do.

FPU version is too easy.

However, if you want 'unsigned' operation, it is not a solution.

Anyway, it is usually much faster than integer ops + conditional jumps. I once timed it against __divdi3() in libgcc (with 64bit divisor), and FPU version is at least twice as fast as __divdi3(). (Of course, this is not a fair comparison to an optimized integer ops version if there is one.)

```
```

fild qword ptr [64bit int]

fild dword ptr [32bit int]

fdivp ; fdiv if MASM

fistp qword ptr [64bit result]

However, if you want 'unsigned' operation, it is not a solution.

**fild**always treats the MSB as the sign bit.Anyway, it is usually much faster than integer ops + conditional jumps. I once timed it against __divdi3() in libgcc (with 64bit divisor), and FPU version is at least twice as fast as __divdi3(). (Of course, this is not a fair comparison to an optimized integer ops version if there is one.)

Thanks!

This is my final implementation.

Reminder is in ecx.

;In <- edx:eax = 64 bit dividend

; <- ecx = 32bit divisor

push eax

mov eax, edx

xor edx, edx

div ecx

xchg eax,

div ecx

xchg edx,

pop ecx

;Out -> edx:eax = 64 bit quotient.

;Out -> ecx = reminder

/SgtPepper

This is my final implementation.

Reminder is in ecx.

;In <- edx:eax = 64 bit dividend

; <- ecx = 32bit divisor

push eax

mov eax, edx

xor edx, edx

div ecx

xchg eax,

div ecx

xchg edx,

pop ecx

;Out -> edx:eax = 64 bit quotient.

;Out -> ecx = reminder

/SgtPepper