This is an inferior prime numbers search algorithm, which I have since optimized so as to work 8 times faster...but even so, it is way slower than the sieve algorithm discussed here

Create a table of primes...

A(x,0)=x+1 th prime

A(x,1)=next odd number divisible by A(x,0)

start

2 (x=0)

3 (x=1)

m=# primes

n=# being checked

counting n in twos beginning at 5,

x=1 to m-1

if n<=A(x,1), then A(x,1)=A(x,1)+2*A(x,0); flag=1

next x

if flag=0 then A(m,0)=n; A(m,1)=n*3; m=m+1;

next n

The programs check and store primes up to the size of array (mx) or memory allocated (memtoalloc). They could do with some optimization... 16384 primes found in 4 seconds on a Pentium III. But it's a non-optimal algorithm anyway...

BASIC and HLA (Assembly) source code and executable included.

Create a table of primes...

A(x,0)=x+1 th prime

A(x,1)=next odd number divisible by A(x,0)

start

2 (x=0)

3 (x=1)

m=# primes

n=# being checked

counting n in twos beginning at 5,

x=1 to m-1

if n<=A(x,1), then A(x,1)=A(x,1)+2*A(x,0); flag=1

next x

if flag=0 then A(m,0)=n; A(m,1)=n*3; m=m+1;

next n

The programs check and store primes up to the size of array (mx) or memory allocated (memtoalloc). They could do with some optimization... 16384 primes found in 4 seconds on a Pentium III. But it's a non-optimal algorithm anyway...

BASIC and HLA (Assembly) source code and executable included.

```
```

mx=1000

dim a(mx,1)

cls

Print"Searching for primes..."

print"2":a(0,0)=2

print"3":a(1,0)=3:a(1,1)=3*3

m=2:n=5

n1:

f=0

for j=m-1 to 1 step -1

if n>=a(j,1) then a(j,1)=a(j,1)+2*a(j,0):f=1

next j

if f=0 then A(m,0)=n:A(m,1)=n*3:m=m+1:'print m"th prime is"n

if m<1000 then n=n+2:goto n1

print m"primes found"

```
```

// **********************************

// Prime Numbers Search Program

// Coded using HLA

//

// **********************************

program primes;

#include( "stdlib.hhf" );

const

memtoalloc :=$2000;

static (4)

memaddr: pointer to byte;

numprimes: int32:=2 * 8; // 2 primes found so far

number: int32:=3;

flag: int32;

t:time.timerec;

begin primes;

console.cls();

console.gotoxy(4, 15);

stdout.put ( "Primes Search Program.", nl);

console.gotoxy(5, 15);

stdout.put ( " Coded using HLA", nl, nl);

stdout.put ( "Initializing memory. Initial Time: ");

time.curTime(t);

stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');

stdout.puti8Size(t.mins, 2, '0'); stdout.put(':');

stdout.puti8Size(t.secs, 2, '0');

// Allocate RAM; On return EAX contains address;

malloc (memtoalloc);

// Store memory address

mov (eax, memaddr);

// Clear memory

mov (memaddr, esi);

xor (eax, eax); // Fill mem with 0

mov (memtoalloc, ecx);

sub (4, ecx);

clm1:

mov (edx, [esi+ecx]);

sub (4, ecx);

jns clm1;

// Commencing Message

stdout.put ( nl, nl,"Commencing Search. Time: ");

time.curTime(t);

stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');

stdout.puti8Size(t.mins, 2, '0'); stdout.put(':');

stdout.puti8Size(t.secs, 2, '0'); stdout.put(nl);

// Check clocks

rdtsc; // First measure of time

// rdtsc takes 13 cycles on Pentium MMX,

push (edx); // I store the 64 bit cycle count on the stack.

push (eax);

mov (memaddr, esi);

mov (2, eax);

mov (eax, [esi+4]);

mov (3, eax);

mov (eax, [esi+12]);

lea (edx, [eax+eax*2]);

mov (edx, [esi+8]);

mov (number, eax);

nx:

mov (numprimes, ecx);

nx_:

sub (8, ecx);

xor (ebx, ebx); // flag

add (2, eax);

nxj:

cmp (eax, [esi+ecx]);

jb nxk;

mov ([esi+ecx+4], edx);

lea (edx, [edx*2]);

add (edx, [esi+ecx]);

mov (1, ebx);

nxk:

sub (8, ecx);

jne nxj;

or (ebx, ebx);

jne nx;

mov (numprimes, ecx);

mov (eax, [esi+ecx+4]);

lea (edx, [eax*2]);

mov (edx, [esi+ecx]);

// stdout.put((type uns32 eax), nl);

add (8, ecx);

mov (ecx, numprimes);

cmp (ecx, memtoalloc);

jne nx_;

mov (eax, number);

// Check clocks

rdtsc; // Second measure of time

push (edx);

push (eax);

// Completion Message

stdout.put ( nl, nl,"Search Stopped at Time: ");

time.curTime(t);

stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');

stdout.puti8Size(t.mins, 2, '0'); stdout.put(':');

stdout.puti8Size(t.secs, 2, '0');

// Report

mov (ecx, numprimes);

shr (3, ecx);

stdout.put (nl, (type uns32 ecx), " Primes found in ");

// Retrieve 2nd rdtsc

pop (eax);

pop (edx);

sub ([esp], eax); // subtract first from second

sbb ([esp+4],edx); // result in EDX:EAX

add (8, esp); // remove first edx, eax from stack

if (edx=0) then

stdout.putu32(eax);

stdout.put(" clock cycles = ", nl);

else

stdout.put ("$",edx," ",eax, " clock cycles = ", nl);

endif;

end primes;

Here is the 'optimized' version. It fixes up some jump targets, keeps some values in registers, and does not cycle through all the primes for each number being checked...not at once anyway...

As a result, it is some orders faster.

However, the algorithm still works in O n^2 time, which is very inefficient... cf Bubbe sort as opposed to Heap or Quick sort.

As a result, it is some orders faster.

However, the algorithm still works in O n^2 time, which is very inefficient... cf Bubbe sort as opposed to Heap or Quick sort.

```
'defint a-h, j-z
```

open "c:\primes2.txt" for output as #1

mx=2000

dim a(mx,1)

cls

Print"Searching for primes..."

print"2":a(0,0)=2

print"3":a(1,0)=3:a(1,1)=3*3

m=2:n=5

n1:

f=0

j=0

n2:

if n>a(j,1) then a(j,1)=a(j,1)+2*a(j,0):goto n2

if n=a(j,1) then n=n+2: goto n1

j=j+1: if j<m then n2

A(m,0)=n:A(m,1)=n*3:m=m+1:'print m"th prime is"n

if m<mx then n=n+2:goto n1

print m"primes found"

close #1

```
// **********************************
```

// Primes Numbers Search Program

// Coded using HLA

//

// **********************************

program primes;

#include( "stdlib.hhf" );

const

memtoalloc :=$10000;

static (4)

memaddr: pointer to byte;

numprimes: int32:=2 * 8; // 2 primes found so far

number: int32:=3;

flag: int32;

t:time.timerec;

begin primes;

console.cls();

console.gotorc(4, 15);

stdout.put ( "Primes Search Program.", nl);

console.gotorc(5, 15);

stdout.put ( " Coded using HLA", nl, nl);

stdout.put ( "Initializing memory. Initial Time: ");

time.curTime(t);

stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');

stdout.puti8Size(t.mins, 2, '0'); stdout.put(':');

stdout.puti8Size(t.secs, 2, '0');

// Allocate RAM; On return EAX contains address;

malloc (memtoalloc);

// Store memory address

mov (eax, memaddr);

// Clear memory

mov (memaddr, esi);

xor (eax, eax); // Fill mem with 0

mov (memtoalloc, ecx);

sub (4, ecx);

clm1:

mov (edx, [esi+ecx]);

sub (4, ecx);

jns clm1;

// Commencing Message

stdout.put ( nl, nl,"Commencing Search. Time: ");

time.curTime(t);

stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');

stdout.puti8Size(t.mins, 2, '0'); stdout.put(':');

stdout.puti8Size(t.secs, 2, '0'); stdout.put(nl);

// Check clocks

rdtsc; // First measure of time

// rdtsc takes 13 cycles on Pentium MMX,

push (edx); // I store the 64 bit cycle count on the stack.

push (eax);

mov (memaddr, esi);

mov (2, eax); // first prime

mov (eax, [esi+4]);

mov (3, eax); // second prime

mov (eax, [esi+12]);

lea (edx, [eax+eax*2]); // next number for 3

mov (edx, [esi+8]);

mov (number, eax);

n0:

add (2, eax); // n=5 first time

mov (8, ecx); // j=1

n2:

cmp (eax, [esi+ecx]);

je n0;

jb nx;

mov ([esi+ecx+4], edx);

lea (edx, [edx*2]);

add (edx, [esi+ecx]);

jmp n2; //

n2a:

cmp (eax, [esi+ecx]);

je n0;

jb nx;

add (edx, [esi+ecx]);

jmp n2a;

nx:

add (8, ecx); // j=j+1

cmp (ecx, numprimes); // if j<m then n1

jb n2;

mov (numprimes, ecx);

mov (eax, [esi+ecx+4]);

lea (edx, [eax+eax*2]);

mov (edx, [esi+ecx]);

// stdout.put((type uns32 eax), nl);

add (8, ecx);

mov (ecx, numprimes);

cmp (ecx, memtoalloc);

jne n0;

mov (eax, number);

// Check clocks

rdtsc; // Second measure of time

push (edx);

push (eax);

// Completion Message

stdout.put ( nl, nl,"Search Stopped at Time: ");

time.curTime(t);

stdout.puti16Size(t.hours, 2, '0'); stdout.put(':');

stdout.puti8Size(t.mins, 2, '0'); stdout.put(':');

stdout.puti8Size(t.secs, 2, '0');

// Report

mov (ecx, numprimes);

shr (3, ecx);

stdout.put (nl, (type uns32 ecx), " Primes found in ");

// Retrieve 2nd rdtsc

pop (eax);

pop (edx);

sub ([esp], eax); // subtract first from second

sbb ([esp+4],edx); // result in EDX:EAX

add (8, esp); // remove first edx, eax from stack

if (edx=0) then

stdout.putu32(eax);

stdout.put(" clock cycles = ", nl);

else

stdout.put ("$",edx," ",eax, " clock cycles = ", nl);

endif;

end primes;

Hints on further optimization purely as a matter of programming challenge are welcome.

I accept that this algorithm is WAAYYY slow. For discussion of better routines go to the link in the first panel.

(Yeah, I code in HLA, whcih uses brackets, and

I accept that this algorithm is WAAYYY slow. For discussion of better routines go to the link in the first panel.

(Yeah, I code in HLA, whcih uses brackets, and

**source, destination**. The Intel**dest, source**order used by most other Assemblers is__obviously__*wrong*. But its more popular So what?!?)Erm, what do you mean, "wrong"? Isn't that up to Intel to decide? Since they're, um, the

**makers**of the processor and all.I felt is was wrong, too. But that is coming from 680x0 programming. Now it seems very natural and I wouldn't have it any other way. I can really understand where

**V Coder**is coming from.My apologies: My 'HLA versus other assemblers' barb was probably more fitted to the Crusades, right? ;)

My real point was this though: "Don't just complain that it is written in HLA. Plus, in case you're trying to understand the code, remember to reverse instruction order."

I sometimes code in MASM format and include it in my HLA code with #asm ........ #endasm. However, I am trying to avoid needing to do so. But the algorithm can be understood and copied easily once these differences are noted beforehand...

My real point was this though: "Don't just complain that it is written in HLA. Plus, in case you're trying to understand the code, remember to reverse instruction order."

I sometimes code in MASM format and include it in my HLA code with #asm ........ #endasm. However, I am trying to avoid needing to do so. But the algorithm can be understood and copied easily once these differences are noted beforehand...

// Clear memory

mov (memaddr, esi);

xor (eax, eax); // Fill mem with 0

mov (memtoalloc, ecx);

sub (4, ecx);

clm1:

mov (edx, );

sub (4, ecx);

jns clm1;

mov (memaddr, esi);

xor (eax, eax); // Fill mem with 0

mov (memtoalloc, ecx);

sub (4, ecx);

clm1:

mov (edx, );

sub (4, ecx);

jns clm1;

Why do you clear

**eax**, though store

**edx**into memory?

**comrade**

Now that is a major error which I had not noticed!!! Thanks.

I copied that code from another program I have been working on, and it was corrected there already so long ago (just over a month perhaps) that I cannot remember correcting it...

(Strictly speaking, of course, I need not clear the memory at all, since the data will be written to the memory whenever the index is updated to reflect there is another prime found... This is probably why the program does not report any error, and the list of primes is correct.)

** **