Thanks bitrake, I will chew on your code (though it looks pretty hard to understand). Thank you, tenkey for explaining square root in binary to me.:alright:
Posted on 2003-10-25 13:57:46 by roticv
Norbert Juffa's code might be a better read:
sqr_iter MACRO N:REQ

lea edx, [eax + 1 SHL N]
lea ebx, [eax + 2 SHL N]
shl edx, N
mov esi, ecx
sub ecx, edx
cmovnc eax, ebx
cmovc ecx, esi

; ECX is X^2

xor eax, eax ; 2*root

; iteration 15
mov ebx, 2 SHL 15
mov esi, ecx
sub ecx, 1 SHL 30
cmovnc eax, ebx
cmovc ecx, esi


; iteration 0
lea edx, [eax + 1]
lea ebx, [eax + 2]
sub ecx, edx
cmovnc eax, ebx

shr eax, 1

; EAX = X
Posted on 2003-10-25 14:37:33 by bitRAKE
Perhaps bittians are what you want. Ratch
Posted on 2003-10-25 18:29:33 by Ratch

I developed my own large integer math library and use follow method

1.) if Value <= 0 then return (Value = 0)
2.) lookup Value and $FF in a table with 256 entries, 0.171875 Candidates pass this test
3.) compute the Crosssum over Value to 2^96 -1 in S, this computation is very fast, only additions
4.) compute T = S mod (193 * 97 * 17 * 13 * 7 * 5), only <= 3 divisions 32 bit, a 3*32Bit / 32 Bit div
5.) lookup T mod 193, T mod 97, T mod 17, T mod 13, T mod 7, T mod 5, if any are'nt perfect square remainders return False
6.) compute T = S mod (9 * 673 * 257 * 241)
7.) lookup T mod 9, T mod 673, T mod 257, T mod 241, if any are'nt perfect square remainders return False
8.) now only 0.0002392 pass the test upto here, eg. 1 of 4181 candidates, all test are ordered by they probabilty of perfect squares. Upto here we need to any arbitrary Integer only 16 times a 32 Bit Division.
9.) We have now to expand the tests. Instead of 32 Bit divisions of arbitrary Integers to calc remainders to any 32 bit number, we choose a formula such that Result * (2^32)^C == Q * D - A, where D is odd. This way we could test an arbitrary Integer if it divisible to a small and odd 32 bit Number. We can't calculate in this way the exact remainder. This operations need about Ln2(Value) / 32 * (1 mul + 2 add), Value is our arbitrary integer. Thus we can test to all primes upto 257 in chunks. The final probability that a numer pass all these tests are 0.000000000000000008613904520439 = 1 of 116091372690195275 Candidates. Thats very low. Remaining Numbers must be checked with a Square Root algorithm, but not with Remainder ! Instead of calculating the remainder we scale up our Number * 2^32, do the Square Root, test if Result mod 2^16 <> 0, and scale down Result / 2^16. The test Result mod 2^16 <> 0 get us the knownledge if it is NOT a perfect square with a probality of 1/2^16. If mod 2^16 = 0 we have to squaring Result and check with Value. If equal then we know definitely it's a perfect square. Thus we testing in Perfect Square Test if a Number is NOT a perfect Square until to the last test by Squaring again the Sqrt and comparing with Input.
The Sqrt Algorithm is thus called only 1/116091372690195275 and the final Squaring 1/(116091372690195275 * 2^16).

Testing 1 Million random 1024 bit numbers take on my P4 130 milliseconds, eg. need about 190 clockcycles per number. On average a 1 million bits number need 0.047 clockcycles per bit.

Naive algorithms to calculate the Sqrt of any arbitrary Integer are inpracticable eg. inperformant. Most used are Newton Iterations. In my library i use the so called "Recursive Karatsuba Square Root Algorithm" of (INRIA, "Karatsuba Square Root" by Paul Zimmermann Research Report No 3805, file: It's a adaption of the fast "Karatsuba Divsion Algorithm" of (Christoph Burnikel & Joachim Ziegler, MPI-I-98-1-022 October 1998, Research Report Max Planck Institut f?r Informatik).

Best Regards, Hagen
Posted on 2003-10-27 14:54:29 by Hagen
Hagen, your detailed replies and experience in these matters is greatly appreciated.
Posted on 2003-10-27 19:37:17 by bitRAKE
@bitRake, many thanks :)

But i think many times that nobody can understand my bad english !

Above informations are based on deep self expiriences with such implementations and the goal to implement the fastest possible algorithms, I'm a algorithm junky, such as You :-)

Some small different Perfect Square Test's can be found in the libraries GMP, Miracl, HFloat and so on...
But i have'nt seen any Algorithm as mine with these low probabilities. The needed Lookuptables requieres about 1024 bytes storage. That's to big for some needs.

Realy more complex are Perfect Power Tests, i suggest then to search for ("Detecting perfect powers in essentially linear time" Daniel J. Bernstein, file:

Regards, Hagen
Posted on 2003-10-28 07:37:12 by Hagen