Hi all,
First: THANK YOU all for an excellent forum! :)
Second:
I want to optimize a 64-bit integer division by a 32 bit integer.
The result is allowed to be a 64 bit rounded integer.
If possible show its rounded with flag set.
I've got the 64 bit stored as 2 Dword, LoOperand and HiOperand.
The result as integer 2 Dword as LoResult and HiResult.
Any trick from the gurus??
SgtPepper
First: THANK YOU all for an excellent forum! :)
Second:
I want to optimize a 64-bit integer division by a 32 bit integer.
The result is allowed to be a 64 bit rounded integer.
If possible show its rounded with flag set.
I've got the 64 bit stored as 2 Dword, LoOperand and HiOperand.
The result as integer 2 Dword as LoResult and HiResult.
Any trick from the gurus??
SgtPepper
The first trick is......
Post some code.
Post some code.
Assuming your divisor is 32 bit unsigned. Signed and/or 64bit divisor makes it much more complicated.
Of course this could all be done better with MMX.
[size=12];In <- edx:eax = 64 bit dividend
; <- ecx = 32bit divisor
push eax
mov eax, edx
xor edx, edx
div ecx
xchg eax, [esp]
div ecx
pop edx
;Out -> edx:eax = 64 bit quotient.[/size]
Of course this could all be done better with MMX.
Of course this could all be done better with MMX
MMX?
Did you mean FPU?
Of course there might be some several algos to do it with MMX, but I haven't seen one at least as fast as FPU.
The only other algo I've seen to do it used MMX. I haven't seen an FPU version although I imagine it would be relatively easy to do.
FPU version is too easy.
However, if you want 'unsigned' operation, it is not a solution. fild always treats the MSB as the sign bit.
Anyway, it is usually much faster than integer ops + conditional jumps. I once timed it against __divdi3() in libgcc (with 64bit divisor), and FPU version is at least twice as fast as __divdi3(). (Of course, this is not a fair comparison to an optimized integer ops version if there is one.)
fild qword ptr [64bit int]
fild dword ptr [32bit int]
fdivp ; fdiv if MASM
fistp qword ptr [64bit result]
However, if you want 'unsigned' operation, it is not a solution. fild always treats the MSB as the sign bit.
Anyway, it is usually much faster than integer ops + conditional jumps. I once timed it against __divdi3() in libgcc (with 64bit divisor), and FPU version is at least twice as fast as __divdi3(). (Of course, this is not a fair comparison to an optimized integer ops version if there is one.)
Thanks!
This is my final implementation.
Reminder is in ecx.
;In <- edx:eax = 64 bit dividend
; <- ecx = 32bit divisor
push eax
mov eax, edx
xor edx, edx
div ecx
xchg eax,
div ecx
xchg edx,
pop ecx
;Out -> edx:eax = 64 bit quotient.
;Out -> ecx = reminder
/SgtPepper
This is my final implementation.
Reminder is in ecx.
;In <- edx:eax = 64 bit dividend
; <- ecx = 32bit divisor
push eax
mov eax, edx
xor edx, edx
div ecx
xchg eax,
div ecx
xchg edx,
pop ecx
;Out -> edx:eax = 64 bit quotient.
;Out -> ecx = reminder
/SgtPepper