Is there a fast sqrt arithm of a 32bit integer?

Posted on 2003-08-25 00:01:10 by lovelypp
Posted on 2003-08-25 00:38:32 by bitRAKE
thanks!
Posted on 2003-09-15 09:03:43 by lovelypp
lovelypp,

Check this site out. Ratch

http://www.bmath.net/bmath/index.html
Posted on 2003-09-15 09:16:38 by Ratch
``````
sqrt32: ; returns: eax=Square root
mov ebx,32768 ; initial guess
mov edi,ebx
mov ecx,eax
.top:
mov eax,ecx
xor edx,edx
div ebx
shr eax,1
cmp eax,ebx
je .finished
cmp eax,edi
mov edi,ebx
mov ebx,eax
jne .top
cmp eax,edi
jl .finished
mov eax,edi
.finished:
ret
``````

MATRIX
Posted on 2004-09-29 19:00:47 by >Matrix<
You may also want to look at the following procedure which is part of a fixed point math library available at the very bottom of the following site page (mixlib09.zip):

http://www.movsd.com/source.htm

``````sqrt32 proc public uInt:DWORD

; -------------------------------------------------------------
; Returns the square root of a 32-bit unsigned integer
; The result is rounded to the closest integer.
;
; eax, ebx, ecx, edx used
;
; EXAMPLE: invoke sqrt32, 32_bit_value
;
; Returns result in eax
;
; -------------------------------------------------------------

mov edx,uInt
xor eax,eax
mov ebx,40000000h
mov ecx,16
sqr10:
cmp eax,edx
ja  sqr40
sqr20:
sub edx,eax
jnz sqr50
sqr40:
sub eax,ebx
sqr50:
shr eax,1
shr ebx,2
dec ecx
jnz sqr10

cmp eax,edx
jae @F
inc eax

@@:
ret

sqrt32 endp``````

Another similar procedure is also available to extract the square root of 64-bit integers (albeit slower).

Raymond
Posted on 2004-09-29 21:32:20 by Raymond
Hi,

Or alternate fpu command way;

``````
.data
numb	dd 0

.code

fldz
push 32768
pop [numb]
fild dword ptr [numb]
fsqrt
``````

Best Regards,
Posted on 2004-09-30 00:24:49 by CYDONIA
Greetings,

This is an alternative approach based on an article which appeared
in Dr. Dobbs some time ago:

``````
sqroot PROC USES edi N:DWORD
mov    edi, N
mov    eax, 1
cmp    edi, 2
jle    short @F

mov    ecx, 4
mov    edx, 5
@1:
cmp    edi, ecx
jle    short @F

add    ecx, edx    ;edx = 2*N-1 ecx = N^2 eax = N-1
inc    eax
jmp	 short @1
@@:
jl     short @2
inc    eax
@2:
ret
sqroot ENDP
``````
Posted on 2004-10-03 00:01:13 by Poimander
Greetings,

This is an alternative approach based on an article which appeared
in Dr. Dobbs some time ago:

``````
sqroot PROC USES edi N:DWORD
mov    edi, N
mov    eax, 1
cmp    edi, 2
jle    short @F

mov    ecx, 4
mov    edx, 5
@1:
cmp    edi, ecx
jle    short @F

add    ecx, edx    ;edx = 2*N-1 ecx = N^2 eax = N-1
inc    eax
jmp	 short @1
@@:
jl     short @2
inc    eax
@2:
ret
sqroot ENDP
``````

Hy Poimander
nice, you say it is size optimized?
not the fastest way but we hope its working :)
have you compared the speeds yet?
what do you think what will be the difference in speed trying for example 8 divisions, or making 30000 additions?

MATRIX
Posted on 2004-10-03 08:32:39 by >Matrix<
Greetings >Matrix<,

The code isn't optimal and is suitable for relatively "small" integer inputs.
However it can be used as the basis for a more optimal version.
Posted on 2004-10-03 12:40:29 by Poimander
On average, the fastest way of extracting the square root of a random 32-bit integer will always be to use the FPU's fsqrt instruction.

With the CPU, the algo I posted previously can be improved slightly, and the speed improved primarily by unrolling the loop. Doing 32 shifts and additions should be significantly faster than doing 8 slow divisions.

Raymond
Posted on 2004-10-03 22:25:59 by Raymond
Well, my previous code submission is as slow as molasses even after
souping it up alittle. I'm not exactly an optimization expert. As an
alternative I developed a mod of Raymond's sqrt32 routine. The resulting
code varies from about 0% to 50% faster over the entire 32bit input range
depending on the input.

``````
;------------------------------------------------
; Modified version of Raymond's nearest integer square root
; routine 'sqrt32'.
;
; Basic idea: Enhance computation by using the highest bit set in
; the operand to compute a suitable upper bound.
;
; Destroys ecx and edx
; Result returned in eax
;------------------------------------------------
sqrt32b proc public USES ebx uInt:DWORD
xor   eax, eax
mov   edx, uInt

bsr   ecx, edx
and   ecx, 0FFFFFFFEh
mov   ebx, 1
shl   ebx, cl
shr   ecx, 1
inc   ecx
@@:
cmp   eax, edx
ja    short @@1
sub   edx, eax
jnz   short @@2
@@1:
sub   eax, ebx
@@2:
shr   eax, 1
shr   ebx, 2
dec   ecx
jnz   short @B

cmp   eax, edx
jae   short @F
inc   eax
@@:
ret
sqrt32b endp
``````
Posted on 2004-10-07 16:13:09 by Poimander
I remember seeing some cool square root algos on a Polish site, so I went on a hunt...

http://faqs.org.ru/progr/graph/ddesign4.htm

...found some stuff, but not the Polish site. IIRC, it had fast psuedo-root code as well -- when percision is not needed.
Posted on 2004-10-07 17:47:57 by bitRAKE
To make our initial guess at the sqrt, we could use the BSR (bit scan reverse) opcode, which is a sort of crude base-2 logarithm. Say the input is in EAX.
``````
bsr ecx,eax   ;If eax<>0, this puts a value of 0 to 31 in ecx, namely
;the index of the most significant set bit in eax.
jnz eax_was_zero
shr ecx,1  ;For the square root, divide the "logarithm" in half.
jz eax_was_one
xor ebx,ebx
stc
rcl ebx,cl  ;Now ebx is our initial guess.
``````
Posted on 2004-10-13 01:00:01 by LarryH
``````    .586
.model flat, stdcall
option casemap:none
;================================================
.nolist
include     \masm32\include\windows.inc
include     \masm32\include\kernel32.inc
include     \masm32\include\user32.inc

includelib  \masm32\lib\kernel32.lib
includelib  \masm32\lib\user32.lib

include     \masm32\macros\macros.asm
;================================================
.listall
.code
;================================================
Sqrt32  proc uses ebx ecx edx   iValue

mov ecx, 8000h
mov ebx, ecx
@@:
mov eax, ebx
mul ebx

.if eax > iValue
mov eax, ecx
not eax
and ebx, eax
.endif

shr ecx, 1
or  ebx, ecx

.if ecx
jmp @B
.endif

mov eax, ebx

ret

Sqrt32  endp
;================================================
WinMain proc

local   zTmp[16]:byte

mov     eax, 213423905
invoke  Sqrt32, eax

ret

WinMain endp
;================================================
start:
invoke  WinMain
invoke  ExitProcess, 0
end start
;================================================``````
Posted on 2004-10-15 04:18:35 by Kestrel
If you don't mind some inaccuracy, Andre LaMothe in one of his huge game programming books used this short routine to extract a square root of a 32bit floating point number. It can easily be converted from and to an integer with the fild and fst instructions which might round off some of the error. According to this the square root of 144 is 12.5 instead of 12 so use with caution.

``````
.data ?
float_variable real4 ?
.code
mov eax, float_variable
sub eax, 3f800000h
sar eax,1
mov float_variable, eax  ``````
Posted on 2004-10-19 12:40:19 by x_filed
Cool snippet - wish I thought of it myself. :) It works because floating point numbers are biased and the top bit is missing.

So it converts:

1.A x 2^B

to

x 2^INT(B/2)

144 = 10010000y

144 = 1.001y x 2^7

...and the routine generates:

1.1001y x 2^3

Where does the extra bit come from? Well, if the exponent is odd then the routine adds 0.5 to the mantissa.
Posted on 2004-10-19 13:00:20 by bitRAKE
I just whipped it up...
``````Sqrt PROC uint32:DWORD
xor ecx, ecx
xor eax, eax
bsr edx, uint32
and edx, -2	; even
bts ecx, edx

sub uint32, eax
jnc s1
sub eax, ecx
jmp s2