Hello every one

im looking for source code for this algurithem "newton square root"

done in 8086/88 assambly .

thanks for the hellpers

im looking for source code for this algurithem "newton square root"

done in 8086/88 assambly .

thanks for the hellpers

Here's a short SSE2 function that uses scalar double precision FP values to get an approximate square root value. If you don't have a processor with SSE2 extensions use FPU instructions.

Usage:

Even with all the write stalls it'll run faster than doing it with FPU

Using another xmm reg (for num save) you can unroll the iteration loop to do 2 iterations at a time if you wanted to squeeze some extra clock cycles out of it.

Usage:

`DATA?`

dVal dq 65535.0

CODE?

? ? push 8? ? ? ?;number of iterations (aka how accurate you want the answer to be)

? ? push dVal? ?;pointer to the double to find the sqrt (aka mem address of double)

? ? call NewtonSquareRoot

? ? push dword

? ? push dword

? ? push dfmt

? ? call

push 0

push dfmt

push dfmt

push 0

call ? ;keep console program from closing on you after it outputs result

NewtonSquareRoot:

;SSE2 Because FPU stack can bite me :D

;ARG1 IN/OUT ptr to qword double value esp+4

;ARG2 IN number of iterations to perform

? ? ? mov eax,

? ? ? MOVQ xmm0,qword

? ? ? mov ecx,

? ? ? MOVQ xmm1,xmm0 ;save

? ? ? DIVSD xmm0,qword ;get initial guess

? ? ? MOVQ xmm2,qword? ?;used in all iterations

? ? ? MOVQ xmm3,xmm1

;xmm3 & xmm1 = number ;; xmm2 = 2.0 ;; xmm0 = quess of root

.LOOPIT: ;new x = 1/2(x + b/x)

? ? ? DIVSD xmm1,xmm0 ; = num/quess

? ? ? ADDSD xmm1,xmm0 ; = result + guess

? ? ? DIVSD xmm1,xmm2 ; = / 2 = new guess

? ? ? MOVQ xmm0,xmm1? ; mov guess into xmm0

? ? ? MOVQ xmm1,xmm3? ; mov num back into xmm1

? ? ? dec ecx

? ? ? jnz .LOOPIT

? ? ? MOVQ qword,xmm0 ;the best guess after ECX # of iterations

? ? ? retn 8

align 16? ; align for insignificantly faster MOVQ

? ? ? GetGuess dq 3.0 ;divide by this to get an initial guess

align 16

? ? ? Double2? dq 2.0 ; load 2 so we can divid by it

Even with all the write stalls it'll run faster than doing it with FPU

Using another xmm reg (for num save) you can unroll the iteration loop to do 2 iterations at a time if you wanted to squeeze some extra clock cycles out of it.

Thanks for the fast reply

but im student and i need the source in basic assembly code .

8086

Thanks again.

but im student and i need the source in basic assembly code .

8086

Thanks again.

So you wanted to cheat on your assignment eh? Why didn't you just say so, you could have saved both yourself and us a lot of time.

Mirno

Mirno

Nevertheless, I think r22's example is a good one. ;)

Thanks Roticv, I was waiting for someone to say that.

R22's example may be excellent but, apart from a curiosity or practice in coding with SSE instructions, I don't immediately see the advantage of using such a complex procedure to obtain an "estimate" of a square root when the more exact one can be obtained with simply the

Raymond

**fsqrt**FPU instruction.Raymond

Raymond all you had to do was read the first post.

It's a implementation of an algorithm not a viable square root solution.

The algorithm itself would only be useful if you were making some sort of BigFloat math library and you wanted to get square roots of varying but very precise accuracy.

ACTUALLY A CORRECTION

This SSE2 code runs faster than the FSQRT FPU opcode on my 3.2ghz P4

looking for newton square root in assambly

? on: July 02, 2005, 02:43:25 PM ?? ?

--------------------------------------------------------------------------------

Hello every one

im looking for source code for this algurithem "newton square root"

done in 8086/88 assambly .

thanks for the hellpers

? on: July 02, 2005, 02:43:25 PM ?? ?

--------------------------------------------------------------------------------

Hello every one

im looking for source code for this algurithem "newton square root"

done in 8086/88 assambly .

thanks for the hellpers

It's a implementation of an algorithm not a viable square root solution.

The algorithm itself would only be useful if you were making some sort of BigFloat math library and you wanted to get square roots of varying but very precise accuracy.

ACTUALLY A CORRECTION

This SSE2 code runs faster than the FSQRT FPU opcode on my 3.2ghz P4

data?

dVal dq 40000400.0

code?

? ? push 10? ? ? ;20 iterations? UNROLLED

? ? push dVal

? ? call NewtonSquareRoot

align 16 ; unrolled for a slight speed boost with ECX = 10 it runs faster and is just as accurate as FSQRT

NewtonSquareRoot:

;SSE2 Because FPU stack can bite me :D

;ARG1 IN/OUT ptr to qword double value esp+4

;ARG2 IN number of iterations to perform*2 (2xUNROLL)

? ? ? mov eax,

? ? ? MOVQ xmm0,qword

? ? ? mov ecx,

? ? ? MOVQ xmm1,xmm0 ;save

? ? ? DIVSD xmm0,qword ;get initial guess

? ? ? MOVQ xmm2,qword? ?;used in all iterations

? ? ? MOVQ xmm3,xmm1

? ? ? MOVQ xmm4,xmm1? ;added for unroll

;xmm3 & xmm4 & xmm1 = number ;; xmm2 = 2.0 ;; xmm0 = quess of root

.LOOPIT: ;new x = 1/2(x + b/x)

? ? ? DIVSD xmm1,xmm0 ; = num/quess

? ? ? ADDSD xmm1,xmm0 ; = result + guess

? ? ? DIVSD xmm1,xmm2 ; = / 2 = new guess

? ? ? ;iteration 2 unroll

? ? ? DIVSD xmm3,xmm1

? ? ? ADDSD xmm3,xmm1

? ? ? DIVSD xmm3,xmm2

? ? ? MOVQ xmm1,xmm4? ; mov num back into xmm1

? ? ? MOVQ xmm0,xmm3? ; mov guess into xmm0

? ? ? MOVQ xmm3,xmm4? ; mov num back into xmm3

? ? ? dec ecx

? ? ? jnz .LOOPIT

? ? ? MOVQ qword,xmm0 ;the best guess after ECX # of iterations

? ? ? retn 8

align 16? ; align for insignificantly faster MOVQ

? ? ? GetGuess dq 3.0 ;divide by this to get an initial guess

align 16

? ? ? Double2? dq 2.0 ; load 2 so we can divid by it?

hahaha Umen

R22's example may be excellent but, apart from a curiosity or practice in coding with SSE instructions, I don't immediately see the advantage of using such a complex procedure to obtain an "estimate" of a square root when the more exact one can be obtained with simply the

**fsqrt**FPU instruction.

Raymond

Actually, I haven't looked at the code closely, but in terms of accuracy, this method can yield as accurate a result as you want as long as you continue to iterate and store the result (as the accuracy increases the operands size will increase).? The FPU instruction, on the other hand, is limited in precision to the register size you are using.? Furthermore, the procedure may seem complex, but is is surprisingly simple, and may be more efficient than the FPU instruction in some cases.? The same is true of other iterative approximations.? The FPU assumes that you want operands of certain set precisions, which is in most cases true.

hahaha Umen

Just to clarify comrade's post - "umen" means "clever/smart" in Russian and Bulgarian (and who knows in how many more slav languages) :)

Umen, I can't understand why they torture you with 8086/88. It would've crippled me as an asm coder if I were taught that.

And, you should've read the rules of the board - no homeworks accepted, for a good reason.

The SQRTSD instruction is (of course) wildly faster than the FSQRT and the SSE version of the Newton algo.

I went from being a VB programmer (if you can even call it programming :D) to win32asm.

A friend taught me the basics, then I just kept learning with the help of a decompiler and intel/amd's giant pdf files. Not starting with 16bit or any formal training really increased the learning curve.

Classes on ASM should really allow SSE into the curriculum, I know when I started learning it some documentation on it was pretty vague, (not knowing what packed or scalar meant didnt help things.)

Did anyone else notice that in manuals like the one that comes with NetwideASM and some intel manuals say that that PSLQ shifts the contents of the register by BYTEs instead of bits, when in reality it does no such thing. Was there some mix up ?

I went from being a VB programmer (if you can even call it programming :D) to win32asm.

A friend taught me the basics, then I just kept learning with the help of a decompiler and intel/amd's giant pdf files. Not starting with 16bit or any formal training really increased the learning curve.

Classes on ASM should really allow SSE into the curriculum, I know when I started learning it some documentation on it was pretty vague, (not knowing what packed or scalar meant didnt help things.)

Did anyone else notice that in manuals like the one that comes with NetwideASM and some intel manuals say that that PSLQ shifts the contents of the register by BYTEs instead of bits, when in reality it does no such thing. Was there some mix up ?

................

Classes on ASM should really allow SSE into the curriculum, I know when I started learning it some documentation on it was pretty vague, (not knowing what packed or scalar meant didnt help things.)

Maybe in the next 5 years it will be here.