Is there a fast sqrt arithm of a 32bit integer?

thanks in advance
Posted on 2003-08-25 00:01:10 by lovelypp
Posted on 2003-08-25 00:38:32 by bitRAKE
thanks!
Posted on 2003-09-15 09:03:43 by lovelypp
lovelypp,

Check this site out. Ratch

http://www.bmath.net/bmath/index.html
Posted on 2003-09-15 09:16:38 by Ratch


sqrt32: ; returns: eax=Square root
mov ebx,32768 ; initial guess
mov edi,ebx
mov ecx,eax
.top:
mov eax,ecx
xor edx,edx
div ebx
add eax,ebx
shr eax,1
cmp eax,ebx
je .finished
cmp eax,edi
mov edi,ebx
mov ebx,eax
jne .top
cmp eax,edi
jl .finished
mov eax,edi
.finished:
ret

MATRIX
Posted on 2004-09-29 19:00:47 by >Matrix<
You may also want to look at the following procedure which is part of a fixed point math library available at the very bottom of the following site page (mixlib09.zip):

http://www.movsd.com/source.htm

sqrt32 proc public uInt:DWORD


; -------------------------------------------------------------
; Returns the square root of a 32-bit unsigned integer
; The result is rounded to the closest integer.
;
; eax, ebx, ecx, edx used
;
; EXAMPLE: invoke sqrt32, 32_bit_value
;
; Returns result in eax
;
; -------------------------------------------------------------

mov edx,uInt
xor eax,eax
mov ebx,40000000h
mov ecx,16
sqr10:
add eax,ebx
cmp eax,edx
ja sqr40
sqr20:
sub edx,eax
add eax,ebx
jnz sqr50
sqr40:
sub eax,ebx
sqr50:
shr eax,1
shr ebx,2
dec ecx
jnz sqr10

cmp eax,edx
jae @F
inc eax

@@:
ret

sqrt32 endp

Another similar procedure is also available to extract the square root of 64-bit integers (albeit slower).

Raymond
Posted on 2004-09-29 21:32:20 by Raymond
Hi,

Or alternate fpu command way;



.data
numb dd 0
quad dd 0

.code

fldz
push 32768
pop [numb]
fild dword ptr [numb]
fsqrt
fistp dword ptr [quad]


Best Regards,
Posted on 2004-09-30 00:24:49 by CYDONIA
Greetings,

This is an alternative approach based on an article which appeared
in Dr. Dobbs some time ago:



sqroot PROC USES edi N:DWORD
mov edi, N
mov eax, 1
cmp edi, 2
jle short @F

mov ecx, 4
mov edx, 5
@1:
cmp edi, ecx
jle short @F

add ecx, edx ;edx = 2*N-1 ecx = N^2 eax = N-1
add edx, 2
inc eax
jmp short @1
@@:
jl short @2
inc eax
@2:
ret
sqroot ENDP
Posted on 2004-10-03 00:01:13 by Poimander
Greetings,

This is an alternative approach based on an article which appeared
in Dr. Dobbs some time ago:



sqroot PROC USES edi N:DWORD
mov edi, N
mov eax, 1
cmp edi, 2
jle short @F

mov ecx, 4
mov edx, 5
@1:
cmp edi, ecx
jle short @F

add ecx, edx ;edx = 2*N-1 ecx = N^2 eax = N-1
add edx, 2
inc eax
jmp short @1
@@:
jl short @2
inc eax
@2:
ret
sqroot ENDP


Hy Poimander
nice, you say it is size optimized?
not the fastest way but we hope its working :)
have you compared the speeds yet?
what do you think what will be the difference in speed trying for example 8 divisions, or making 30000 additions?

MATRIX
Posted on 2004-10-03 08:32:39 by >Matrix<
Greetings >Matrix<,

The code isn't optimal and is suitable for relatively "small" integer inputs.
However it can be used as the basis for a more optimal version.
Posted on 2004-10-03 12:40:29 by Poimander
On average, the fastest way of extracting the square root of a random 32-bit integer will always be to use the FPU's fsqrt instruction.

With the CPU, the algo I posted previously can be improved slightly, and the speed improved primarily by unrolling the loop. Doing 32 shifts and additions should be significantly faster than doing 8 slow divisions.

Raymond
Posted on 2004-10-03 22:25:59 by Raymond
Well, my previous code submission is as slow as molasses even after
souping it up alittle. I'm not exactly an optimization expert. As an
alternative I developed a mod of Raymond's sqrt32 routine. The resulting
code varies from about 0% to 50% faster over the entire 32bit input range
depending on the input.



;------------------------------------------------
; Modified version of Raymond's nearest integer square root
; routine 'sqrt32'.
;
; Basic idea: Enhance computation by using the highest bit set in
; the operand to compute a suitable upper bound.
;
; Destroys ecx and edx
; Result returned in eax
;------------------------------------------------
sqrt32b proc public USES ebx uInt:DWORD
xor eax, eax
mov edx, uInt

bsr ecx, edx
and ecx, 0FFFFFFFEh
mov ebx, 1
shl ebx, cl
shr ecx, 1
inc ecx
@@:
add eax, ebx
cmp eax, edx
ja short @@1
sub edx, eax
add eax, ebx
jnz short @@2
@@1:
sub eax, ebx
@@2:
shr eax, 1
shr ebx, 2
dec ecx
jnz short @B

cmp eax, edx
jae short @F
inc eax
@@:
ret
sqrt32b endp
Posted on 2004-10-07 16:13:09 by Poimander
I remember seeing some cool square root algos on a Polish site, so I went on a hunt...

http://faqs.org.ru/progr/graph/ddesign4.htm

...found some stuff, but not the Polish site. IIRC, it had fast psuedo-root code as well -- when percision is not needed.
Posted on 2004-10-07 17:47:57 by bitRAKE
To make our initial guess at the sqrt, we could use the BSR (bit scan reverse) opcode, which is a sort of crude base-2 logarithm. Say the input is in EAX.


bsr ecx,eax ;If eax<>0, this puts a value of 0 to 31 in ecx, namely
;the index of the most significant set bit in eax.
jnz eax_was_zero
shr ecx,1 ;For the square root, divide the "logarithm" in half.
jz eax_was_one
xor ebx,ebx
stc
rcl ebx,cl ;Now ebx is our initial guess.
Posted on 2004-10-13 01:00:01 by LarryH
    .586

.model flat, stdcall
option casemap:none
;================================================
.nolist
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\user32.inc

includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\user32.lib

include \masm32\macros\macros.asm
;================================================
.listall
.code
;================================================
Sqrt32 proc uses ebx ecx edx iValue

mov ecx, 8000h
mov ebx, ecx
@@:
mov eax, ebx
mul ebx

.if eax > iValue
mov eax, ecx
not eax
and ebx, eax
.endif

shr ecx, 1
or ebx, ecx

.if ecx
jmp @B
.endif

mov eax, ebx

ret

Sqrt32 endp
;================================================
WinMain proc

local zTmp[16]:byte

mov eax, 213423905
invoke Sqrt32, eax

invoke wsprintf, addr zTmp, SADD("%i"), eax
invoke MessageBox, 0, addr zTmp, SADD("Title"), MB_OK
ret

WinMain endp
;================================================
start:
invoke WinMain
invoke ExitProcess, 0
end start
;================================================
Posted on 2004-10-15 04:18:35 by Kestrel
If you don't mind some inaccuracy, Andre LaMothe in one of his huge game programming books used this short routine to extract a square root of a 32bit floating point number. It can easily be converted from and to an integer with the fild and fst instructions which might round off some of the error. According to this the square root of 144 is 12.5 instead of 12 so use with caution.






.data ?
float_variable real4 ?
.code
mov eax, float_variable
sub eax, 3f800000h
sar eax,1
add eax, 3f800000h
mov float_variable, eax
Posted on 2004-10-19 12:40:19 by x_filed
Cool snippet - wish I thought of it myself. :) It works because floating point numbers are biased and the top bit is missing.

So it converts:

1.A x 2^B

to

x 2^INT(B/2)


144 = 10010000y

144 = 1.001y x 2^7

...and the routine generates:

1.1001y x 2^3

Where does the extra bit come from? Well, if the exponent is odd then the routine adds 0.5 to the mantissa.
Posted on 2004-10-19 13:00:20 by bitRAKE
I just whipped it up...
Sqrt PROC uint32:DWORD

xor ecx, ecx
xor eax, eax
bsr edx, uint32
and edx, -2 ; even
bts ecx, edx

s0: add eax, ecx
sub uint32, eax
jnc s1
add uint32, eax
sub eax, ecx
jmp s2

s1: add eax, ecx
s2: shr eax, 1
shr ecx, 2
jne s0
ret
Sqrt ENDP
This in the itterative method that one might use by hand with pencil and paper; but in binary and optimized. The algorithm is undefined at zero.

I don't know much about IC design, but I think this algorithm could have a space saving over the implementation defined in the following document:
http://venus.elfak.ni.ac.yu/projects/IMPEG/DigitalSystem.pdf
Posted on 2005-01-05 01:06:31 by bitRAKE