This is an inferior prime numbers search algorithm, which I have since optimized so as to work 8 times faster...but even so, it is way slower than the sieve algorithm discussed here

Create a table of primes...
A(x,0)=x+1 th prime
A(x,1)=next odd number divisible by A(x,0)
start
2 (x=0)
3 (x=1)
m=# primes
n=# being checked
counting n in twos beginning at 5,
x=1 to m-1
if n<=A(x,1), then A(x,1)=A(x,1)+2*A(x,0); flag=1
next x
if flag=0 then A(m,0)=n; A(m,1)=n*3; m=m+1;
next n

The programs check and store primes up to the size of array (mx) or memory allocated (memtoalloc). They could do with some optimization... 16384 primes found in 4 seconds on a Pentium III. But it's a non-optimal algorithm anyway...

BASIC and HLA (Assembly) source code and executable included.

``````
mx=1000
dim a(mx,1)
cls

Print"Searching for primes..."
print"2":a(0,0)=2
print"3":a(1,0)=3:a(1,1)=3*3
m=2:n=5

n1:
f=0
for j=m-1 to 1 step -1
if n>=a(j,1) then a(j,1)=a(j,1)+2*a(j,0):f=1
next j
if f=0 then A(m,0)=n:A(m,1)=n*3:m=m+1:'print m"th prime is"n
if m<1000 then n=n+2:goto n1
print m"primes found"
``````

``````
// **********************************
// Prime Numbers Search Program
//         Coded using HLA
//
// **********************************

program primes;
#include( "stdlib.hhf" );

const
memtoalloc :=\$2000;
static (4)

numprimes: int32:=2 * 8;                 // 2 primes found so far
number: int32:=3;
flag: int32;

t:time.timerec;

begin primes;

console.cls();
console.gotoxy(4, 15);
stdout.put ( "Primes Search Program.", nl);
console.gotoxy(5, 15);
stdout.put ( "        Coded using HLA", nl, nl);

stdout.put ( "Initializing memory. Initial Time: ");
time.curTime(t);
stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');
stdout.puti8Size(t.mins, 2, '0');   stdout.put(':');
stdout.puti8Size(t.secs, 2, '0');

// Allocate RAM; On return EAX contains address;
malloc (memtoalloc);

// Clear memory
xor (eax, eax);      // Fill mem with 0
mov (memtoalloc, ecx);
sub (4, ecx);
clm1:
mov (edx, [esi+ecx]);

sub (4, ecx);
jns clm1;

// Commencing Message
stdout.put ( nl, nl,"Commencing Search.         Time: ");
time.curTime(t);
stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');
stdout.puti8Size(t.mins, 2, '0');   stdout.put(':');
stdout.puti8Size(t.secs, 2, '0');   stdout.put(nl);

// Check clocks
rdtsc;                       // First measure of time
// rdtsc takes 13 cycles on Pentium MMX,
push (edx);                  // I store the 64 bit cycle count on the stack.
push (eax);

mov (2, eax);
mov (eax, [esi+4]);
mov (3, eax);
mov (eax, [esi+12]);
lea (edx, [eax+eax*2]);
mov (edx, [esi+8]);

mov (number, eax);

nx:
mov (numprimes, ecx);

nx_:
sub (8, ecx);

xor (ebx, ebx);                   // flag

nxj:
cmp (eax, [esi+ecx]);
jb nxk;

mov ([esi+ecx+4], edx);
lea (edx, [edx*2]);
mov (1, ebx);

nxk:
sub (8, ecx);
jne nxj;

or (ebx, ebx);
jne nx;

mov (numprimes, ecx);

mov (eax, [esi+ecx+4]);
lea (edx, [eax*2]);
mov (edx, [esi+ecx]);
//     stdout.put((type uns32 eax), nl);

mov (ecx, numprimes);

cmp (ecx, memtoalloc);
jne nx_;

mov (eax, number);

// Check clocks
rdtsc;                       // Second measure of time
push (edx);
push (eax);

// Completion Message
stdout.put ( nl, nl,"Search Stopped at Time: ");
time.curTime(t);
stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');
stdout.puti8Size(t.mins, 2, '0');   stdout.put(':');
stdout.puti8Size(t.secs, 2, '0');

// Report
mov (ecx, numprimes);
shr (3, ecx);
stdout.put (nl, (type uns32 ecx), " Primes found  in ");

// Retrieve 2nd rdtsc
pop (eax);
pop (edx);

sub ([esp], eax);             // subtract first from second
sbb ([esp+4],edx);            // result in EDX:EAX

add (8, esp);                 // remove first edx, eax from stack

if (edx=0) then
stdout.putu32(eax);
stdout.put(" clock cycles = ", nl);
else
stdout.put ("\$",edx," ",eax, " clock cycles = ", nl);
endif;

end primes;
``````
Posted on 2003-06-26 21:25:41 by V Coder
Here is the 'optimized' version. It fixes up some jump targets, keeps some values in registers, and does not cycle through all the primes for each number being checked...not at once anyway...

As a result, it is some orders faster.

However, the algorithm still works in O n^2 time, which is very inefficient... cf Bubbe sort as opposed to Heap or Quick sort.

``````'defint a-h, j-z
open "c:\primes2.txt" for output as #1
mx=2000
dim a(mx,1)
cls

Print"Searching for primes..."
print"2":a(0,0)=2
print"3":a(1,0)=3:a(1,1)=3*3
m=2:n=5

n1:
f=0
j=0
n2:
if n>a(j,1) then a(j,1)=a(j,1)+2*a(j,0):goto n2
if n=a(j,1) then n=n+2: goto n1
j=j+1: if j<m then n2
A(m,0)=n:A(m,1)=n*3:m=m+1:'print m"th prime is"n
if m<mx then n=n+2:goto n1
print m"primes found"

close #1``````

``````// **********************************
// Primes Numbers Search Program
//         Coded using HLA
//
// **********************************

program primes;
#include( "stdlib.hhf" );

const
memtoalloc :=\$10000;
static (4)

numprimes: int32:=2 * 8;                 // 2 primes found so far
number: int32:=3;
flag: int32;

t:time.timerec;

begin primes;

console.cls();
console.gotorc(4, 15);
stdout.put ( "Primes Search Program.", nl);
console.gotorc(5, 15);
stdout.put ( "        Coded using HLA", nl, nl);

stdout.put ( "Initializing memory. Initial Time: ");
time.curTime(t);
stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');
stdout.puti8Size(t.mins, 2, '0');   stdout.put(':');
stdout.puti8Size(t.secs, 2, '0');

// Allocate RAM; On return EAX contains address;
malloc (memtoalloc);

// Clear memory
xor (eax, eax);      // Fill mem with 0
mov (memtoalloc, ecx);
sub (4, ecx);
clm1:
mov (edx, [esi+ecx]);

sub (4, ecx);
jns clm1;

// Commencing Message
stdout.put ( nl, nl,"Commencing Search.         Time: ");
time.curTime(t);
stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');
stdout.puti8Size(t.mins, 2, '0');   stdout.put(':');
stdout.puti8Size(t.secs, 2, '0');   stdout.put(nl);

// Check clocks
rdtsc;                       // First measure of time
// rdtsc takes 13 cycles on Pentium MMX,
push (edx);                  // I store the 64 bit cycle count on the stack.
push (eax);

mov (2, eax);                     // first prime
mov (eax, [esi+4]);
mov (3, eax);                     // second prime
mov (eax, [esi+12]);
lea (edx, [eax+eax*2]);           // next number for 3
mov (edx, [esi+8]);

mov (number, eax);

n0:
add (2, eax);                     // n=5 first time
mov (8, ecx);                     // j=1

n2:
cmp (eax, [esi+ecx]);
je n0;

jb nx;

mov ([esi+ecx+4], edx);
lea (edx, [edx*2]);
jmp n2;                           //
n2a:
cmp (eax, [esi+ecx]);
je n0;
jb nx;
jmp n2a;

nx:
cmp (ecx, numprimes);             // if j<m then n1
jb n2;

mov (numprimes, ecx);

mov (eax, [esi+ecx+4]);
lea (edx, [eax+eax*2]);
mov (edx, [esi+ecx]);
//     stdout.put((type uns32 eax), nl);

mov (ecx, numprimes);

cmp (ecx, memtoalloc);
jne n0;

mov (eax, number);

// Check clocks
rdtsc;                       // Second measure of time
push (edx);
push (eax);

// Completion Message
stdout.put ( nl, nl,"Search Stopped at Time: ");
time.curTime(t);
stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');
stdout.puti8Size(t.mins, 2, '0');   stdout.put(':');
stdout.puti8Size(t.secs, 2, '0');

// Report
mov (ecx, numprimes);
shr (3, ecx);
stdout.put (nl, (type uns32 ecx), " Primes found  in ");

// Retrieve 2nd rdtsc
pop (eax);
pop (edx);

sub ([esp], eax);             // subtract first from second
sbb ([esp+4],edx);            // result in EDX:EAX

add (8, esp);                 // remove first edx, eax from stack

if (edx=0) then
stdout.putu32(eax);
stdout.put(" clock cycles = ", nl);
else
stdout.put ("\$",edx," ",eax, " clock cycles = ", nl);
endif;

end primes;``````
Posted on 2003-06-26 21:39:19 by V Coder
Hints on further optimization purely as a matter of programming challenge are welcome.

I accept that this algorithm is WAAYYY slow. For discussion of better routines go to the link in the first panel.

(Yeah, I code in HLA, whcih uses brackets, and source, destination. The Intel dest, source order used by most other Assemblers is obviously wrong. But its more popular So what?!?)
Posted on 2003-06-26 21:48:07 by V Coder
Erm, what do you mean, "wrong"? Isn't that up to Intel to decide? Since they're, um, the makers of the processor and all.
Posted on 2003-06-27 17:03:00 by Sephiroth3
I felt is was wrong, too. But that is coming from 680x0 programming. Now it seems very natural and I wouldn't have it any other way. I can really understand where V Coder is coming from.
Posted on 2003-06-27 18:35:29 by bitRAKE
My apologies: My 'HLA versus other assemblers' barb was probably more fitted to the Crusades, right? ;)

My real point was this though: "Don't just complain that it is written in HLA. Plus, in case you're trying to understand the code, remember to reverse instruction order."

I sometimes code in MASM format and include it in my HLA code with #asm ........ #endasm. However, I am trying to avoid needing to do so. But the algorithm can be understood and copied easily once these differences are noted beforehand...
Posted on 2003-06-28 21:57:17 by V Coder
// Clear memory
xor (eax, eax); // Fill mem with 0
mov (memtoalloc, ecx);
sub (4, ecx);
clm1:
mov (edx, );

sub (4, ecx);
jns clm1;

Why do you clear eax, though store edx into memory?
Posted on 2003-06-28 22:18:31 by comrade