A large unsigned integer square root routine,
as coded it works on 1024 bit unsigned integers.
To use it with other sizes requires no changes to the code,
only change the content or size of the variables.
The content of those that hold sizes of the main variables:
jbd : size of bdd
jbs : size of bsp
jbr : size of brm, brt
The size of the main variables:
bdd : number
bsp : number,spread out two bits at a time
brt : root
brm : remainder
It doesn't use fpu,mmx,3dnow,sse,sse2, or the div instruction.
Uses a single mul which occurs after the routine
to verify if a root is from a square
root 40 from the square 1600
versus
root 40 rem 3 from 1603
The code calculates 1,000,000 square roots using sequential 1024 bit unsigned integers,
which takes 0:02:37 CPU time on a 1.2 Ghz AMD Athlon.
The only output is a messagebox at the end displaying the number of iterations.
as coded it works on 1024 bit unsigned integers.
To use it with other sizes requires no changes to the code,
only change the content or size of the variables.
The content of those that hold sizes of the main variables:
jbd : size of bdd
jbs : size of bsp
jbr : size of brm, brt
The size of the main variables:
bdd : number
bsp : number,spread out two bits at a time
brt : root
brm : remainder
It doesn't use fpu,mmx,3dnow,sse,sse2, or the div instruction.
Uses a single mul which occurs after the routine
to verify if a root is from a square
root 40 from the square 1600
versus
root 40 rem 3 from 1603
The code calculates 1,000,000 square roots using sequential 1024 bit unsigned integers,
which takes 0:02:37 CPU time on a 1.2 Ghz AMD Athlon.
The only output is a messagebox at the end displaying the number of iterations.
Does your 0:02:37 timing include conversions to/from binary/ASCII or only the computation of the square roots from a binary variable?
If only from a binary variable, would you have any estimate of the time required for converting an ASCII input equivalent to a 1024-bit binary number (i.e. approx. 300 decimal digits) before extracting its square root, and the estimate of time to convert the answer back to ASCII (i.e. approx. 150 decimal digits)?
Raymond
If only from a binary variable, would you have any estimate of the time required for converting an ASCII input equivalent to a 1024-bit binary number (i.e. approx. 300 decimal digits) before extracting its square root, and the estimate of time to convert the answer back to ASCII (i.e. approx. 150 decimal digits)?
Raymond
No decimal ascii to binary and binary to decimal ascii conversion code.
Hex string to binary and binary to hex string will be done.
The 0:02:37 includes the time to initialize the number with random bytes once
also setting the high nibble to F and the low dword to 0.
The low dword does double duty as a counter.
It also includes code after the square root calc to quickly check if the root is from a square,
though I haven't determined if there are any values that it would miss and give a false positive.
This quick verify code is included at the end of each of the 1 million evaluations.
The rational of the quick verify is, if the remainder, brm, is zero and the low dword of the root, brt, is squared
and then the low dword of brt*brt == low dword of the number, bdd, then the root is from a square.
In the original delphi code is a routine to convert an even length hex string to binary,
also a routine to get the root back to a hex string,
but I haven't done the assembly code for them yet.
j := length(s) div 2;
for i := j downto 1 do begin
bdd := fromhex(s + s);
end;
s := '';
for i := j downto 0 do
s := s + tohex(ba);
Each conversion is a byte at a time.
I don't have conversion routines from decimal ascii to binary or binary to decimal ascii
tested with the delphi or assembly code.
Hex string to binary and binary to hex string will be done.
The 0:02:37 includes the time to initialize the number with random bytes once
also setting the high nibble to F and the low dword to 0.
The low dword does double duty as a counter.
It also includes code after the square root calc to quickly check if the root is from a square,
though I haven't determined if there are any values that it would miss and give a false positive.
This quick verify code is included at the end of each of the 1 million evaluations.
The rational of the quick verify is, if the remainder, brm, is zero and the low dword of the root, brt, is squared
and then the low dword of brt*brt == low dword of the number, bdd, then the root is from a square.
In the original delphi code is a routine to convert an even length hex string to binary,
also a routine to get the root back to a hex string,
but I haven't done the assembly code for them yet.
j := length(s) div 2;
for i := j downto 1 do begin
bdd := fromhex(s + s);
end;
s := '';
for i := j downto 0 do
s := s + tohex(ba);
Each conversion is a byte at a time.
I don't have conversion routines from decimal ascii to binary or binary to decimal ascii
tested with the delphi or assembly code.
Your algo may be relatively fast and accurate (although I haven't checked the accuracy myself), but what would you consider as its usefulness if it can't accept a decimal string input nor provide the answer as a decimal string? Very few humans can grasp the meaning of a long hex string.
Raymond
Raymond
I have a simple bin2ascii rotine that may help a little with those big numbers,
it isn't optimized at all, takes 528k cycles to convert a 1024 bit number to ascii and 113k cycles for a 512 bit number,
i need 1 sec to convert a 65536 bit number to ascii,
it writes the result in dst buffer and reads from src buffer (this one have a big number in little endian format), cnt is the length of src buffer (in dwords), the dst buffer must be 2.5 times larger than src and, for a zero based string, have an extra byte at the end.
it isn't optimized at all, takes 528k cycles to convert a 1024 bit number to ascii and 113k cycles for a 512 bit number,
i need 1 sec to convert a 65536 bit number to ascii,
it writes the result in dst buffer and reads from src buffer (this one have a big number in little endian format), cnt is the length of src buffer (in dwords), the dst buffer must be 2.5 times larger than src and, for a zero based string, have an extra byte at the end.
bin2ascii proc uses edi esi ebx dst:DWORD, src:DWORD, cnt:DWORD
mov ecx, cnt
lea ecx,
shl ecx, 1
mov edi, dst
@@:
invoke bigdiv, 10, src, cnt
add al, 30h
mov , al
dec ecx
jnz @B
ret
bin2ascii endp
bigdiv proc uses ebx esi edi, dvr:DWORD, dst:DWORD, cnt:DWORD
mov esi, cnt
mov edi, dst
mov ebx, dvr
xor edx, edx
@@:
mov eax,
div ebx
mov , eax
dec esi
jnz @B
mov eax, edx
ret
bigdiv endp
Thanks for the replies and code.
I found a bug in the first attachment, after checking if brt < brm it didn't
save the result so the following code brm - (brm + 1), brt + 2 never was
executed.
Time with the fixed code increased to 3:35, did a few optimizations able
to do the first 30 iterations working on a single dword size for brt, brm.
On the "Is it an exact square root" test, ie brt*brt == bdd, delayed the more expensive
tests, only does them if it gets by the first quickest test. Time dropped to 3:08.
Added routines to display last value of bdd, brt, brm in ascii hex at the end.
I found a bug in the first attachment, after checking if brt < brm it didn't
save the result so the following code brm - (brm + 1), brt + 2 never was
executed.
Time with the fixed code increased to 3:35, did a few optimizations able
to do the first 30 iterations working on a single dword size for brt, brm.
On the "Is it an exact square root" test, ie brt*brt == bdd, delayed the more expensive
tests, only does them if it gets by the first quickest test. Time dropped to 3:08.
Added routines to display last value of bdd, brt, brm in ascii hex at the end.
Added routines:
Convert from decimal (base 10) ascii str to binary.
Convert from hex (base 16) ascii str to binary.
Write hex string to logfile.
Convert from decimal (base 10) ascii str to binary.
Convert from hex (base 16) ascii str to binary.
Write hex string to logfile.