Hello every one
im looking for source code for this algurithem "newton square root"
done in 8086/88 assambly .
thanks for the hellpers
Posted on 2005-07-02 08:43:25 by Umen
Here's a short SSE2 function that uses scalar double precision FP values to get an approximate square root value. If you don't have a processor with SSE2 extensions use FPU instructions.

Usage:
DATA?
dVal dq 65535.0

CODE?
? ? push 8? ? ? ?;number of iterations (aka how accurate you want the answer to be)
? ? push dVal? ?;pointer to the double to find the sqrt (aka mem address of double)
? ? call NewtonSquareRoot
? ? push dword
? ? push dword
? ? push dfmt
? ? call
push 0
push dfmt
push dfmt
push 0
call ? ;keep console program from closing on you after it outputs result

NewtonSquareRoot:
;SSE2 Because FPU stack can bite me :D
;ARG1 IN/OUT ptr to qword double value esp+4
;ARG2 IN number of iterations to perform
? ? ? mov eax,
? ? ? MOVQ xmm0,qword
? ? ? mov ecx,
? ? ? MOVQ xmm1,xmm0 ;save
? ? ? DIVSD xmm0,qword ;get initial guess
? ? ? MOVQ xmm2,qword? ?;used in all iterations
? ? ? MOVQ xmm3,xmm1
;xmm3 & xmm1 = number ;; xmm2 = 2.0 ;; xmm0 = quess of root
.LOOPIT: ;new x = 1/2(x + b/x)
? ? ? DIVSD xmm1,xmm0 ; = num/quess
? ? ? ADDSD xmm1,xmm0 ; = result + guess
? ? ? DIVSD xmm1,xmm2 ; = / 2 = new guess
? ? ? MOVQ xmm0,xmm1? ; mov guess into xmm0
? ? ? MOVQ xmm1,xmm3? ; mov num back into xmm1
? ? ? dec ecx
? ? ? jnz .LOOPIT
? ? ? MOVQ qword,xmm0 ;the best guess after ECX # of iterations
? ? ? retn 8
align 16? ; align for insignificantly faster MOVQ
? ? ? GetGuess dq 3.0 ;divide by this to get an initial guess
align 16
? ? ? Double2? dq 2.0 ; load 2 so we can divid by it


Even with all the write stalls it'll run faster than doing it with FPU
Using another xmm reg (for num save) you can unroll the iteration loop to do 2 iterations at a time if you wanted to squeeze some extra clock cycles out of it.
Posted on 2005-07-02 22:14:23 by r22
Thanks for the fast reply
but im student and i need the source in basic assembly code .
8086


Thanks again.
Posted on 2005-07-03 14:29:53 by Umen
So you wanted to cheat on your assignment eh? Why didn't you just say so, you could have saved both yourself and us a lot of time.

Mirno
Posted on 2005-07-04 05:15:29 by Mirno
Nevertheless, I think r22's example is a good one.  ;)
Posted on 2005-07-04 06:54:13 by roticv
Thanks Roticv, I was waiting for someone to say that.
Posted on 2005-07-04 12:22:48 by r22
R22's example may be excellent but, apart from a curiosity or practice in coding with SSE instructions, I don't immediately see the advantage of using such a complex procedure to obtain an "estimate" of a square root when the more exact one can be obtained with simply the fsqrt FPU instruction.

Raymond
Posted on 2005-07-04 20:36:02 by Raymond
Raymond all you had to do was read the first post.

looking for newton square root in assambly
? on: July 02, 2005, 02:43:25 PM ?? ?

--------------------------------------------------------------------------------
Hello every one
im looking for source code for this algurithem "newton square root"
done in 8086/88 assambly .
thanks for the hellpers


It's a implementation of an algorithm not a viable square root solution.
The algorithm itself would only be useful if you were making some sort of BigFloat math library and you wanted to get square roots of varying but very precise accuracy.

ACTUALLY A CORRECTION
This SSE2 code runs faster than the FSQRT FPU opcode on my 3.2ghz P4


data?
dVal dq 40000400.0

code?
? ? push 10? ? ? ;20 iterations? UNROLLED
? ? push dVal
? ? call NewtonSquareRoot

align 16 ; unrolled for a slight speed boost with ECX = 10 it runs faster and is just as accurate as FSQRT
NewtonSquareRoot:
;SSE2 Because FPU stack can bite me :D
;ARG1 IN/OUT ptr to qword double value esp+4
;ARG2 IN number of iterations to perform*2 (2xUNROLL)
? ? ? mov eax,
? ? ? MOVQ xmm0,qword
? ? ? mov ecx,
? ? ? MOVQ xmm1,xmm0 ;save
? ? ? DIVSD xmm0,qword ;get initial guess
? ? ? MOVQ xmm2,qword? ?;used in all iterations
? ? ? MOVQ xmm3,xmm1
? ? ? MOVQ xmm4,xmm1? ;added for unroll
;xmm3 & xmm4 & xmm1 = number ;; xmm2 = 2.0 ;; xmm0 = quess of root
.LOOPIT: ;new x = 1/2(x + b/x)
? ? ? DIVSD xmm1,xmm0 ; = num/quess
? ? ? ADDSD xmm1,xmm0 ; = result + guess
? ? ? DIVSD xmm1,xmm2 ; = / 2 = new guess
? ? ? ;iteration 2 unroll
? ? ? DIVSD xmm3,xmm1
? ? ? ADDSD xmm3,xmm1
? ? ? DIVSD xmm3,xmm2
? ? ? MOVQ xmm1,xmm4? ; mov num back into xmm1
? ? ? MOVQ xmm0,xmm3? ; mov guess into xmm0
? ? ? MOVQ xmm3,xmm4? ; mov num back into xmm3
? ? ? dec ecx
? ? ? jnz .LOOPIT
? ? ? MOVQ qword,xmm0 ;the best guess after ECX # of iterations
? ? ? retn 8
align 16? ; align for insignificantly faster MOVQ
? ? ? GetGuess dq 3.0 ;divide by this to get an initial guess
align 16
? ? ? Double2? dq 2.0 ; load 2 so we can divid by it?
Posted on 2005-07-04 21:39:16 by r22
hahaha Umen
Posted on 2005-07-05 00:10:01 by comrade

R22's example may be excellent but, apart from a curiosity or practice in coding with SSE instructions, I don't immediately see the advantage of using such a complex procedure to obtain an "estimate" of a square root when the more exact one can be obtained with simply the fsqrt FPU instruction.

Raymond


Actually, I haven't looked at the code closely, but in terms of accuracy, this method can yield as accurate a result as you want as long as you continue to iterate and store the result (as the accuracy increases the operands size will increase).? The FPU instruction, on the other hand, is limited in precision to the register size you are using.? Furthermore, the procedure may seem complex, but is is surprisingly simple, and may be more efficient than the FPU instruction in some cases.? The same is true of other iterative approximations.? The FPU assumes that you want operands of certain set precisions, which is in most cases true.
Posted on 2005-07-12 16:51:29 by Gandolf

hahaha Umen

Just to clarify comrade's post - "umen" means "clever/smart" in Russian and Bulgarian (and who knows in how many more slav languages) :)

Umen, I can't understand why they torture you with 8086/88. It would've crippled me as an asm coder if I were taught that.
And, you should've read the rules of the board - no homeworks accepted, for a good reason.
Posted on 2005-07-15 20:02:08 by Ultrano
The SQRTSD instruction is (of course) wildly faster than the FSQRT and the SSE version of the Newton algo.

I went from being a VB programmer (if you can even call it programming :D) to win32asm.
A friend taught me the basics, then I just kept learning with the help of a decompiler and intel/amd's giant pdf files. Not starting with 16bit or any formal training really increased the learning curve.

Classes on ASM should really allow SSE into the curriculum, I know when I started learning it some documentation on it was pretty vague, (not knowing what packed or scalar meant didnt help things.)

Did anyone else notice that in manuals like the one that comes with NetwideASM and some intel manuals say that that PSLQ shifts the contents of the register by BYTEs instead of bits, when in reality it does no such thing. Was there some mix up ?
Posted on 2005-07-16 23:56:00 by r22

................
Classes on ASM should really allow SSE into the curriculum, I know when I started learning it some documentation on it was pretty vague, (not knowing what packed or scalar meant didnt help things.)



Maybe in the next 5 years it will be here.

Posted on 2005-07-29 12:35:28 by realvampire