A large unsigned integer square root routine,

as coded it works on 1024 bit unsigned integers.

To use it with other sizes requires no changes to the code,

only change the content or size of the variables.

The content of those that hold sizes of the main variables:

jbd : size of bdd

jbs : size of bsp

jbr : size of brm, brt

The size of the main variables:

bdd : number

bsp : number,spread out two bits at a time

brt : root

brm : remainder

It doesn't use fpu,mmx,3dnow,sse,sse2, or the div instruction.

Uses a single mul which occurs after the routine

to verify if a root is from a square

root 40 from the square 1600

versus

root 40 rem 3 from 1603

The code calculates 1,000,000 square roots using sequential 1024 bit unsigned integers,

which takes 0:02:37 CPU time on a 1.2 Ghz AMD Athlon.

The only output is a messagebox at the end displaying the number of iterations.

as coded it works on 1024 bit unsigned integers.

To use it with other sizes requires no changes to the code,

only change the content or size of the variables.

The content of those that hold sizes of the main variables:

jbd : size of bdd

jbs : size of bsp

jbr : size of brm, brt

The size of the main variables:

bdd : number

bsp : number,spread out two bits at a time

brt : root

brm : remainder

It doesn't use fpu,mmx,3dnow,sse,sse2, or the div instruction.

Uses a single mul which occurs after the routine

to verify if a root is from a square

root 40 from the square 1600

versus

root 40 rem 3 from 1603

The code calculates 1,000,000 square roots using sequential 1024 bit unsigned integers,

which takes 0:02:37 CPU time on a 1.2 Ghz AMD Athlon.

The only output is a messagebox at the end displaying the number of iterations.

Does your 0:02:37 timing include conversions to/from binary/ASCII or only the computation of the square roots from a binary variable?

If only from a binary variable, would you have any estimate of the time required for converting an ASCII input equivalent to a 1024-bit binary number (i.e. approx. 300 decimal digits) before extracting its square root, and the estimate of time to convert the answer back to ASCII (i.e. approx. 150 decimal digits)?

Raymond

If only from a binary variable, would you have any estimate of the time required for converting an ASCII input equivalent to a 1024-bit binary number (i.e. approx. 300 decimal digits) before extracting its square root, and the estimate of time to convert the answer back to ASCII (i.e. approx. 150 decimal digits)?

Raymond

No decimal ascii to binary and binary to decimal ascii conversion code.

Hex string to binary and binary to hex string will be done.

The 0:02:37 includes the time to initialize the number with random bytes once

also setting the high nibble to F and the low dword to 0.

The low dword does double duty as a counter.

It also includes code after the square root calc to quickly check if the root is from a square,

though I haven't determined if there are any values that it would miss and give a false positive.

This quick verify code is included at the end of each of the 1 million evaluations.

The rational of the quick verify is, if the remainder, brm, is zero and the low dword of the root, brt, is squared

and then the low dword of brt*brt == low dword of the number, bdd, then the root is from a square.

In the original delphi code is a routine to convert an even length hex string to binary,

also a routine to get the root back to a hex string,

but I haven't done the assembly code for them yet.

j := length(s) div 2;

for i := j downto 1 do begin

bdd := fromhex(s

Hex string to binary and binary to hex string will be done.

The 0:02:37 includes the time to initialize the number with random bytes once

also setting the high nibble to F and the low dword to 0.

The low dword does double duty as a counter.

It also includes code after the square root calc to quickly check if the root is from a square,

though I haven't determined if there are any values that it would miss and give a false positive.

This quick verify code is included at the end of each of the 1 million evaluations.

The rational of the quick verify is, if the remainder, brm, is zero and the low dword of the root, brt, is squared

and then the low dword of brt*brt == low dword of the number, bdd, then the root is from a square.

In the original delphi code is a routine to convert an even length hex string to binary,

also a routine to get the root back to a hex string,

but I haven't done the assembly code for them yet.

j := length(s) div 2;

for i := j downto 1 do begin

bdd := fromhex(s

*+ s**);*

end;

s := '';

for i := j downto 0 do

s := s + tohex(baend;

s := '';

for i := j downto 0 do

s := s + tohex(ba

*);*

Each conversion is a byte at a time.

I don't have conversion routines from decimal ascii to binary or binary to decimal ascii

tested with the delphi or assembly code.Each conversion is a byte at a time.

I don't have conversion routines from decimal ascii to binary or binary to decimal ascii

tested with the delphi or assembly code.

*
*Your algo may be relatively fast and accurate (although I haven't checked the accuracy myself), but what would you consider as its usefulness if it can't accept a decimal string input nor provide the answer as a decimal string? Very few humans can grasp the meaning of a long hex string.

Raymond

Raymond

I have a simple bin2ascii rotine that may help a little with those big numbers,

it isn't optimized at all, takes 528k cycles to convert a 1024 bit number to ascii and 113k cycles for a 512 bit number,

i need 1 sec to convert a 65536 bit number to ascii,

it writes the result in dst buffer and reads from src buffer (this one have a big number in little endian format), cnt is the length of src buffer (in dwords), the dst buffer must be 2.5 times larger than src and, for a zero based string, have an extra byte at the end.

it isn't optimized at all, takes 528k cycles to convert a 1024 bit number to ascii and 113k cycles for a 512 bit number,

i need 1 sec to convert a 65536 bit number to ascii,

it writes the result in dst buffer and reads from src buffer (this one have a big number in little endian format), cnt is the length of src buffer (in dwords), the dst buffer must be 2.5 times larger than src and, for a zero based string, have an extra byte at the end.

bin2ascii proc uses edi esi ebx dst:DWORD, src:DWORD, cnt:DWORD

mov ecx, cnt

lea ecx,

shl ecx, 1

mov edi, dst

@@:

invoke bigdiv, 10, src, cnt

add al, 30h

mov , al

dec ecx

jnz @B

ret

bin2ascii endp

bigdiv proc uses ebx esi edi, dvr:DWORD, dst:DWORD, cnt:DWORD

mov esi, cnt

mov edi, dst

mov ebx, dvr

xor edx, edx

@@:

mov eax,

div ebx

mov , eax

dec esi

jnz @B

mov eax, edx

ret

bigdiv endp

Thanks for the replies and code.

I found a bug in the first attachment, after checking if brt < brm it didn't

save the result so the following code brm - (brm + 1), brt + 2 never was

executed.

Time with the fixed code increased to 3:35, did a few optimizations able

to do the first 30 iterations working on a single dword size for brt, brm.

On the "Is it an exact square root" test, ie brt*brt == bdd, delayed the more expensive

tests, only does them if it gets by the first quickest test. Time dropped to 3:08.

Added routines to display last value of bdd, brt, brm in ascii hex at the end.

I found a bug in the first attachment, after checking if brt < brm it didn't

save the result so the following code brm - (brm + 1), brt + 2 never was

executed.

Time with the fixed code increased to 3:35, did a few optimizations able

to do the first 30 iterations working on a single dword size for brt, brm.

On the "Is it an exact square root" test, ie brt*brt == bdd, delayed the more expensive

tests, only does them if it gets by the first quickest test. Time dropped to 3:08.

Added routines to display last value of bdd, brt, brm in ascii hex at the end.

Added routines:

Convert from decimal (base 10) ascii str to binary.

Convert from hex (base 16) ascii str to binary.

Write hex string to logfile.

Convert from decimal (base 10) ascii str to binary.

Convert from hex (base 16) ascii str to binary.

Write hex string to logfile.