I need to code a combonation function (n

First of all, the combonation function:

n

The 'n exclamation mark', '

it itself is a function:

n! = 1 * 2 * 3 *.... * (n - 2)(n - 1)n

for example:

8! = 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 = 40320

5! =5 * 4 * 3 * 2 * 1 = 120

(A special case is where 0! = 1)

This leaves me with a long series of multiplcations, and a divide.

There is half a way around this:

if

6! / 3! = 6 * 5 * 4 * 3 * 2 * 1 / 3 * 2 * 1

then

6! / 3! = 6 * 5 * 4

because the rest cancels itself out, which simplifys things a bit.

If anyone can see or knows an easier way to code this than a fully blown out unoptimized it would be much appreciated.

-Thanks

**C**r) but I very quickly ran into some problems that Im not sure how to work around.First of all, the combonation function:

n

**C**r = n! / r!(n - r)!The 'n exclamation mark', '

**n!**' means 'n factorial', where n is an integer greater than or equal to zero.it itself is a function:

n! = 1 * 2 * 3 *.... * (n - 2)(n - 1)n

for example:

8! = 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 = 40320

5! =5 * 4 * 3 * 2 * 1 = 120

(A special case is where 0! = 1)

This leaves me with a long series of multiplcations, and a divide.

There is half a way around this:

if

6! / 3! = 6 * 5 * 4 * 3 * 2 * 1 / 3 * 2 * 1

then

6! / 3! = 6 * 5 * 4

because the rest cancels itself out, which simplifys things a bit.

If anyone can see or knows an easier way to code this than a fully blown out unoptimized it would be much appreciated.

-Thanks

```
; esi = n
```

; edi = r

mov edx, esi

sub edx, edi

mov eax, [FacList + edi*8]

; r!(n-r)! {must fit into 32-bits}

mul [FacList + edx*8]

mov ecx,eax

mov eax,[FacList + esi*8]

mov edx,[FacList + esi*8 + 4]

div ecx

; eax = nCr

FacList are the QWORD factorials.
This is very limited. :(

There should be a way to cancel many terms of the factorials.

```
n=7; r=3;
```

7 6 5 4 3 2 1

*** *** *** *** (n!/r!)

*** *** *** *** (n-r)!

See here how the 4 overlaps, and can be factored out. Have to think a little more on how to do that easily.This is very flexible - I believe it will work for any n! up to 16! and many others. The minimum of (r,(n-r)) has to be [0-16]. Also, n!/MAX(r,(n-r)) has to fit in 64 bits. It can be slow. Let me know if your wondering how it works.

Idea is good, though. :)

```
; EDX:EAX * 32 bit constant ; select register/memory to trash
```

; assume result fits into 64 bits

MUL64 MACRO val:REQ, regmem:REQ

mov regmem,eax

mov eax,edx

mul val

xchg eax,regmem

mul val

add edx,regmem

ENDM

; Combintorial: nCr = n! / r!(n-r)! ; n >= r

;

; esi = n

; edi = r

;

mov ecx,esi

sub ecx,edi

; edi = MAX(r, (n-r))

cmp edi,ecx

jnc @F

mov edi,ecx

@@:

mov eax,1

xor edx,edx

mov ebx,esi

jmp _5

_4:

MUL64 ebx, ecx

dec ebx

_5: cmp edi,ebx

jc _4

sub esi,edi ; esi = MIN(r, (n-r)) = [0,16]

mov ecx,[FacListDD2 + esi*8 + 4]

shrd eax,edx,cl

shr edx,cl

div [FacListDD2 + esi*8]

; eax = nCr

ret

const SEGMENT

FacListDD2:

; base power ; n! = base * 2^(power)

dd 1, 0 ; 0!

dd 1, 0 ;

dd 1, 1 ;

dd 3, 1 ;

dd 3, 3 ;

dd 3*5, 3 ; 5!

dd 9*5, 4 ;

dd 9*5*7, 4 ;

dd 9*5*7, 7 ;

dd 81*5*7, 7 ;

dd 81*25*7, 8 ; 10!

dd 81*25*7*11, 8 ;

dd 243*25*7*11, 10 ;

dd 243*25*7*11*13, 10 ;

dd 243*25*49*11*13, 11 ;

dd 729*125*49*11*13, 11 ; 15!

dd 729*125*49*11*13, 15 ;

; dd 729*125*49*11*13*17, 15 ;

; dd 6561*125*49*11*13*17, 16 ;(18!)

const ENDS

**Edit**: There are a couple bugs. :(Idea is good, though. :)

**Edit**: Think I might have fixed it for r=0.Very nice, I am impressed. As far as I can tell will calculate any combonation up to 29

I have attached a basic output which I have quickly scratched together.

Cheers for the hand. I still have a long way to go :)

**C**r. After that the results start to get a bit random as the numbers start to get bigger.I have attached a basic output which I have quickly scratched together.

Cheers for the hand. I still have a long way to go :)

That is great to hear. I didn't have time to test it. Think I'll write an FPU version and confirm the results.

An example of nCr, also written C(n,r):

nCr = nC(n-r)

As you can figure out, C(6,2) yields the same number and would use less division with the alternating multiply/divide algorithm.

You may also want to try a recursive algorithm based on Pascal's triangle.

nC(0) = 1

nCn = 1

nCr = (n-1)C(r-1) + (n-1)Cr

Or simply calculate each row of Pascal's triangle until you get to row n, and pick out the r'th entry.

(I wonder how long it would take to calculate something like (54321543)C(27123543).)

```
6 * 5 * 4 * 3 * 2 * 1 6 * 5 * 4 * 3
```

C(6,4) = --------------------- = -------------

4 * 3 * 2 * 1 * 2 * 1 1 * 2 * 3 * 4

To get larger results without resorting to floating point, you can alternate between multiplication and division. Note that if you multiply three **consecutive**integers together, one of them will be a multiple of 3. Similarly, for any consecutive sequence of integers.nCr = nC(n-r)

As you can figure out, C(6,2) yields the same number and would use less division with the alternating multiply/divide algorithm.

You may also want to try a recursive algorithm based on Pascal's triangle.

nC(0) = 1

nCn = 1

nCr = (n-1)C(r-1) + (n-1)Cr

Or simply calculate each row of Pascal's triangle until you get to row n, and pick out the r'th entry.

(I wonder how long it would take to calculate something like (54321543)C(27123543).)

**tenkey**, all those solutions sound

__really__slow. :) Your C(6,4) example reduces to C(6,2) - you just didn't get rid of the 3 and 4. The symetry is why the algo above finds the MAX(r,(n-r)) to reduce the iterations of the MUL64 loop. The FPU should be able to do something like ~60Cr. I wonder when we would need (54321543)C(27123543)? It would be fun to try and solve it though. :grin:

**Roy**has a BigInt lib that would work.

bitRAKE, I think you underestimate the FP, I found a little program, (written in Visual Basic of all langauges) that can calculate 268

My goal for the result of the combonation is to use it to calculate binomal distributions/probability's (which are always between 0 and 1) so this could be intergrated into the function, giving a even greater range.

**C**134 (or 268**C**r). I would guess a better written asm prog could go even higher.My goal for the result of the combonation is to use it to calculate binomal distributions/probability's (which are always between 0 and 1) so this could be intergrated into the function, giving a even greater range.

**huh**, sorry I should have said, "using the same algo above."

Yes, there are slower algos that have greater flexiblity.

I have over estimated the difficulty of this problem. For very large numbers all that needs to be done is prime factorization and cancel terms to get the resulting integer. This reduces to some very simple code working up to (2^32)

**C**. The result does not always fit into 32 bits, of course.

(2^32 - 1)

**C**(2^31 - 1) = ;) I don't think Mathematica would finish today! (note: number was too large for Mathematica.)

Try this out with my prime sieve! :grin:

```
CHOOSE MACRO __n__,__r__
```

; nCr = n!

; ---------- ; n >= r

; r!(n-r)!

;

; nCr = n*(n-1)*...*MAX((n-r),r)

; --------------------------

; 1*2*3*...*MIN((n-r),r)

;

;

;

;

;

; Note: should add/subtract in the following order:

; C(24,5) = (+24-1)+(23-2)+(22-3)+(21-4)+(20-5)

; ...to ensure all products are reduced with minimal stack size

;

push ebp

mov ebp,esp

mov ebx,PrimeBits ; prime sieve!

mov edi,1

mov esi,__n__

push edi ;)

_3: ; over range [F(MAX(r,(n-r))), F(n)]

; store prime factors of ESI on the stack

mov eax, esi

; divide by all primes less than N until = 1

mov ecx, 2

jmp _3a

_31: ; not a product of ECX

; restore EAX to test again

pop eax

; get next prime from ECX

dec ecx ; needed for 2

shr ecx,1

_32: ; test for primeness

bt [ebx],ecx

inc ecx

jnc _32

lea ecx,[ecx*2 + 1] ; prime #

_3a: ; catch final prime or 1

push eax

cmp eax, ecx

jng _3x

xor edx,edx

div ecx

test edx,edx

jne _31

mov [esp],ecx ; save factor

jmp _3a

_3x:

;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

_4: ; over range [F(2), F(MIN(r,n-r))]

; subtract prime factors of EDI at [ESP,EBP]

mov eax, edi

; divide by all primes less than N until = 1

mov ecx, 2

jmp _4a

_41: ; not a product of ECX

; restore EAX to test again

pop eax

; get next prime from ECX

dec ecx ; needed for 2

shr ecx,1

_42: ; test for primeness

bt [ebx],ecx

inc ecx

jnc _42

lea ecx,[ecx*2 + 1] ; prime #

_4a: ; catch final prime or 1 or 0

cmp eax, ecx

jng _4x

xor edx,edx

push eax

div ecx

test edx,edx

jne _41

; find factor on stack and erase

lea edx,[esp+4]

add esp,4

@@: cmp [edx], ecx

lea edx, [edx+4]

jne @B

pop [edx-4]

jmp _4a

_4x: ; delete final prime or 1

mov edx,esp

@@: cmp [edx], eax

lea edx, [edx+4]

jne @B

pop [edx-4]

dec esi

inc edi

cmp edi,__r__

jle _3

; build number from factorization

; on stack [ESP,EBP)

xor edx,edx

mov eax,1

@@: mov ecx,[esp]

xchg eax,ecx

mul edx

; EDX is zero

xchg eax,ecx

mul DWORD PTR [esp]

add edx,ecx

pop ecx

cmp ebp,esp

jne @B

pop ebp

; EAX:EDX is U64 result

ENDM

C(n,r) ; r should be MIN(r,(n-r)) to use above algo.
It's not the fastest for very large numbers, but gets the job done and I can understand how it works. Major slow down is the brute force approach to factoring - easy to plug into other routines. Stack memory is used for convienence.

Often playing the card game Spades, we only have three players (we play partners and one guy has to leave). So, we just throw two jokers in and everyone gets 17 cards. Well, with all those cards 47,153,358,767,970 hands are possible! (54

**C**17 and the algorithm hardly used the stack!)

Hi Bitrake,

the Prime"factorization" Method leads to the fastest possible Method to compute the factorial of large numbers. In fact 1.000.000! can be computed in this way in 3.5 seconds on a P4 1.5Ghz. The results have 5.565.709 decimal digits. Now, the "primefactorization" method can be easily expanded to strike out all duplicate primepower exponents if we want the binomial.

As Example

10! = ((3)^2 * 5)^2 * 7 * 2^8 -> shortest computational path for 10!

5! = ((3)^1 * 5)^1 * * 2^3

now for C(10,5) we strike out the exponents by subtractions

C(10,5) = 10!/5!/5!

10! = 3^4 * 5^2 * 7^1 * 2^8

- 5! = 3^1 * 5^1 * 7^0 * 2^3

..... = 3^3 * 5^1 * 7^1 * 2^5

- 5! = 3^1 * 5^1 * 7^0 * 2^3

..... = 3^2 * 7 * 2^2 = 252

thus C(10,5) = 252

Above we get the primepower expansion and strike out by subtraction each exponent.

After this we can reorder each exponent such as above shown

252 = 3^2* 7 * 2^2 = (3*2)^2 * 7

Here we get one Multiplication + one Squaring + one Multiplication. Thats the computational shortest possible path to compute this value. One Squaring is about 1.5 times faster as the multiplication (of course with very large numbers).

Hagen

PS: Arnold Sch?nhage has proposed this way.

the Prime"factorization" Method leads to the fastest possible Method to compute the factorial of large numbers. In fact 1.000.000! can be computed in this way in 3.5 seconds on a P4 1.5Ghz. The results have 5.565.709 decimal digits. Now, the "primefactorization" method can be easily expanded to strike out all duplicate primepower exponents if we want the binomial.

As Example

10! = ((3)^2 * 5)^2 * 7 * 2^8 -> shortest computational path for 10!

5! = ((3)^1 * 5)^1 * * 2^3

now for C(10,5) we strike out the exponents by subtractions

C(10,5) = 10!/5!/5!

10! = 3^4 * 5^2 * 7^1 * 2^8

- 5! = 3^1 * 5^1 * 7^0 * 2^3

..... = 3^3 * 5^1 * 7^1 * 2^5

- 5! = 3^1 * 5^1 * 7^0 * 2^3

..... = 3^2 * 7 * 2^2 = 252

thus C(10,5) = 252

Above we get the primepower expansion and strike out by subtraction each exponent.

After this we can reorder each exponent such as above shown

252 = 3^2* 7 * 2^2 = (3*2)^2 * 7

Here we get one Multiplication + one Squaring + one Multiplication. Thats the computational shortest possible path to compute this value. One Squaring is about 1.5 times faster as the multiplication (of course with very large numbers).

Hagen

PS: Arnold Sch?nhage has proposed this way.

**Hagen**, in the algorithm above I use the method you have explained. I put the prime factors on the stack and it is quite fast in practice. For example, C(10,5):

```
```

. = push 1

10 = push 5,2

1 = pop 1

9 = push 3,3

2 = pop 2

8 = push 2,2,2

3 = pop 3

7 = push 7

4 = pop 2,2

6 = push 3,2

5 = pop 5

Which leaves 3,2,7,2,3 on the stack = 252 :)
I don't fully calculate 10! or even accumulate the factors for the lower part (let Arnold Sch?nhage know of my improvement). The algorithm was designed with specific dependancies on x86 ASM and is very fast except the prime decomposition.

Do you know of a faster prime decomposition?

Yes I have seen that you do it in this way, but it's harder to understand a mathematicaly problem only based on asm source :)

I wanted to show why we can use primepower extraction and how to use it for factorials, binomials, product and other combinatoric functions.

> Do you know of a faster prime decomposition?

Your way is for small computations even optimal. For larger numbers with use of floatingpoints we don't need any real computation, there exists faily very exact approxiamtions. Such apporximations are very very fast.

For exact computation with very large numbers is the essential trick to find the shortest path with common powers. Eg. we need a way to multiplicate all products very fast. In this context we have to prefer Squaring over Multiplication because Squaring is 1.5 times faster as Multiplications.

After this we have to multiply some Primorials (eg. Lists of many small primes) very fast. The fastest way are called Binary Splitting for this Operations.

As example 50! =

((((3)^2 *

5*7)^2 *

3*5*11)^2 *

3*13*17*19*23)^2 *

13*29*31*37*41*43*47 * 2^47

As we seen we have some Product liek 3*13*17*19*23 to build. Here we use binary splitting by ((3*13) * 17) * (19 * 23). Eg. a weighted multiplications.

If we consider all this need we see that a Power Table, a array of Prime + Exponent are a very good idea.

So we have to build such a table based on N of N!.

here some small pseudo code:

procedure Powers(Table, N: Integer);

{

I = 0

while true do

{

P = Primes

I wanted to show why we can use primepower extraction and how to use it for factorials, binomials, product and other combinatoric functions.

> Do you know of a faster prime decomposition?

Your way is for small computations even optimal. For larger numbers with use of floatingpoints we don't need any real computation, there exists faily very exact approxiamtions. Such apporximations are very very fast.

For exact computation with very large numbers is the essential trick to find the shortest path with common powers. Eg. we need a way to multiplicate all products very fast. In this context we have to prefer Squaring over Multiplication because Squaring is 1.5 times faster as Multiplications.

After this we have to multiply some Primorials (eg. Lists of many small primes) very fast. The fastest way are called Binary Splitting for this Operations.

As example 50! =

((((3)^2 *

5*7)^2 *

3*5*11)^2 *

3*13*17*19*23)^2 *

13*29*31*37*41*43*47 * 2^47

As we seen we have some Product liek 3*13*17*19*23 to build. Here we use binary splitting by ((3*13) * 17) * (19 * 23). Eg. a weighted multiplications.

If we consider all this need we see that a Power Table, a array of Prime + Exponent are a very good idea.

So we have to build such a table based on N of N!.

here some small pseudo code:

procedure Powers(Table, N: Integer);

{

I = 0

while true do

{

P = Primes

if P > N then break

I = I +1

E = 0

K = N

while K > P do {K = K div P, E = E + K }

if E > 0 then { Table.Base = P, Table.Exponent = E }

}

}

Above we extract the prime power exponents only for N, but we could extract it in one step as example to N/K/L where we have (N! / K! / L!). In this case we have for each prime to compute the exponents to N,K,L in one loop and the subtract these exponents. The final power table could be that one we need to compute the binomial.

By the way, the power to prime 2 can be computed easier.

C(10,3) = 10! / 7! / 3! thus N = 10, K = 7, L = 3 and

(N - L - K) - (BitWeight(N) - BitWeight(L) - BitWeight(K)) = Count of trailing 0 LSBits in the Binomial of C(10, 3).

Thus N - BitWeight(N) = count of trailing 0 bits in N!.

BitWeight(N) is the Count of 1 Bits in N, hamming weight.

After we get this table we have to examine how are the shortest path with a maximum of Squarings.

The trick is fairly easy:

we have now our Powertable such as follow

3^56 * 5^34 * 7^15 * 11^7 * 13^3 * 17^2 * 19^1 * 23^1 * 29^1

an we want

such as (((3*5)^2 * 7)^2 * 11 * 13)^2 .... (only example not math. correct:)

To get this we write down to each prime its power as binary value

3^001111000

5^000110011

7^000011010

...

and so on

now each Bitcolumn from left to right are computed where the column contains 1.

From above we get.

3

3*5

3*5*7

3*7

5

->>

((((3)^2 * (3*5))^2 * (3*5*7))^2 * (3*7))^2 * 5

Thats the shortest path. In fact, when you know how works Exponentations Algorithms = Binary Powering then You can see the direct dependencies to above way. It's nothing others as a binary Powering Algorithms to multiple Bases and Exponents, like A = B^C * D^E * F^G.

For any interesst I attached a small executable with the Pascal/Delphi Source to compute Factorials/Binomials/Permutations/Products/Primorials to arbitrary numbers.

In the source you can find some quit different Factorial Computations Algorithms.

Regards, Hagen

PS: i hope my bad english is'nt to bad :)if P > N then break

I = I +1

E = 0

K = N

while K > P do {K = K div P, E = E + K }

if E > 0 then { Table.Base = P, Table.Exponent = E }

}

}

Above we extract the prime power exponents only for N, but we could extract it in one step as example to N/K/L where we have (N! / K! / L!). In this case we have for each prime to compute the exponents to N,K,L in one loop and the subtract these exponents. The final power table could be that one we need to compute the binomial.

By the way, the power to prime 2 can be computed easier.

C(10,3) = 10! / 7! / 3! thus N = 10, K = 7, L = 3 and

(N - L - K) - (BitWeight(N) - BitWeight(L) - BitWeight(K)) = Count of trailing 0 LSBits in the Binomial of C(10, 3).

Thus N - BitWeight(N) = count of trailing 0 bits in N!.

BitWeight(N) is the Count of 1 Bits in N, hamming weight.

After we get this table we have to examine how are the shortest path with a maximum of Squarings.

The trick is fairly easy:

we have now our Powertable such as follow

3^56 * 5^34 * 7^15 * 11^7 * 13^3 * 17^2 * 19^1 * 23^1 * 29^1

an we want

such as (((3*5)^2 * 7)^2 * 11 * 13)^2 .... (only example not math. correct:)

To get this we write down to each prime its power as binary value

3^001111000

5^000110011

7^000011010

...

and so on

now each Bitcolumn from left to right are computed where the column contains 1.

From above we get.

3

3*5

3*5*7

3*7

5

->>

((((3)^2 * (3*5))^2 * (3*5*7))^2 * (3*7))^2 * 5

Thats the shortest path. In fact, when you know how works Exponentations Algorithms = Binary Powering then You can see the direct dependencies to above way. It's nothing others as a binary Powering Algorithms to multiple Bases and Exponents, like A = B^C * D^E * F^G.

For any interesst I attached a small executable with the Pascal/Delphi Source to compute Factorials/Binomials/Permutations/Products/Primorials to arbitrary numbers.

In the source you can find some quit different Factorial Computations Algorithms.

Regards, Hagen

PS: i hope my bad english is'nt to bad :)

*
*

I think this is same as Hagen's function, in C:

That is called contdiv (maybe an other term but translated word by word it is 'continuous division')

prs is prime list.

The function does not calculates factorial but prints as above

10! : 2^8 * 3^4 * 5^2 * 7

100! : 2^97 * 3^48 * 5^24 * 7^16 * 11^9 * 13^7 * 17^5 * 19^5 * 23^4 * 29^3 * 31^3 * 37^2 * 41^2 * 43^2 * 47^2 * 53 * 53 * 59 * 61 * 67 * 71 * 73 * 79 * 83 * 89 * 97

.

.

I have written this code when we learned how to find how many digits does a factorial has without calculating itself and with out logarithms at school. Algorithm does not seems fast but does the job;)

```
long contdiv(long x ,long i){
```

long tot;

do{

x=floor(x / i);

tot+=x;

}while (x>=1);

return tot;

}

**x**is number under check of factor numbers of prime**i**That is called contdiv (maybe an other term but translated word by word it is 'continuous division')

```
```

main(){

long i,chknum; char *k;

do{

k=strcat(k, " * ");

k=strcat(k, prs[i]);

k=strcat(k, "^");

k=strcat(k, contdiv(chknum,prs[i]));

}while prs[i]<chknum

printf (k);

}

prs is prime list.

The function does not calculates factorial but prints as above

10! : 2^8 * 3^4 * 5^2 * 7

100! : 2^97 * 3^48 * 5^24 * 7^16 * 11^9 * 13^7 * 17^5 * 19^5 * 23^4 * 29^3 * 31^3 * 37^2 * 41^2 * 43^2 * 47^2 * 53 * 53 * 59 * 61 * 67 * 71 * 73 * 79 * 83 * 89 * 97

.

.

I have written this code when we learned how to find how many digits does a factorial has without calculating itself and with out logarithms at school. Algorithm does not seems fast but does the job;)

Yes thats the right code translation if my C knownledge is correct :)

> Algorithm does not seems fast but does the job

Thats dependend on the viewpoint. As i currently can say it is the fastest to compute the factorial/binomial etc. to arbitrary size. There exists currently no other known method with same low complexity. The small 32bit Divisions and the needed memory for the prime power table can be ignored, in comparsion to the needed squarings and multiplications.

Some people proposed and stated that there exists a FFT based factorial function with far lower complexity. But I never found it :(

My attached executable is as i know that fastest PC program to compute such factorials :)

Regards Hagen

> Algorithm does not seems fast but does the job

Thats dependend on the viewpoint. As i currently can say it is the fastest to compute the factorial/binomial etc. to arbitrary size. There exists currently no other known method with same low complexity. The small 32bit Divisions and the needed memory for the prime power table can be ignored, in comparsion to the needed squarings and multiplications.

Some people proposed and stated that there exists a FFT based factorial function with far lower complexity. But I never found it :(

My attached executable is as i know that fastest PC program to compute such factorials :)

Regards Hagen

Thank you

**Hagen**and**inFinie**. I am sorry my math skills are lacking - my asm is better. I'll review the code (I can read C/Pascal equally well) and convert it to asm if appropriate - for comparison. Hopefully, I can learn more about math through my coding. :)http://mathworld.wolfram.com/Factorial.html

n!=gamma(n+1)

Here is a source to approximate gamma function in TASM

http://www.crbond.com/download/misc/gamma32.asm

n!=gamma(n+1)

Here is a source to approximate gamma function in TASM

http://www.crbond.com/download/misc/gamma32.asm