I need to code a combonation function (nCr) but I very quickly ran into some problems that Im not sure how to work around.

First of all, the combonation function:
nCr = n! / r!(n - r)!


The 'n exclamation mark', 'n!' means 'n factorial', where n is an integer greater than or equal to zero.

it itself is a function:
n! = 1 * 2 * 3 *.... * (n - 2)(n - 1)n

for example:
8! = 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 = 40320
5! =5 * 4 * 3 * 2 * 1 = 120
(A special case is where 0! = 1)


This leaves me with a long series of multiplcations, and a divide.

There is half a way around this:
if
6! / 3! = 6 * 5 * 4 * 3 * 2 * 1 / 3 * 2 * 1
then
6! / 3! = 6 * 5 * 4
because the rest cancels itself out, which simplifys things a bit.


If anyone can see or knows an easier way to code this than a fully blown out unoptimized it would be much appreciated.

-Thanks
Posted on 2002-04-25 05:04:26 by huh
; esi = n

; edi = r
mov edx, esi
sub edx, edi

mov eax, [FacList + edi*8]
; r!(n-r)! {must fit into 32-bits}
mul [FacList + edx*8]
mov ecx,eax

mov eax,[FacList + esi*8]
mov edx,[FacList + esi*8 + 4]

div ecx
; eax = nCr
FacList are the QWORD factorials.
This is very limited. :(

There should be a way to cancel many terms of the factorials.
n=7; r=3;


7 6 5 4 3 2 1
*** *** *** *** (n!/r!)
*** *** *** *** (n-r)!
See here how the 4 overlaps, and can be factored out. Have to think a little more on how to do that easily.
Posted on 2002-04-25 12:42:15 by bitRAKE
This is very flexible - I believe it will work for any n! up to 16! and many others. The minimum of (r,(n-r)) has to be [0-16]. Also, n!/MAX(r,(n-r)) has to fit in 64 bits. It can be slow. Let me know if your wondering how it works.
; EDX:EAX * 32 bit constant ; select register/memory to trash

; assume result fits into 64 bits
MUL64 MACRO val:REQ, regmem:REQ
mov regmem,eax
mov eax,edx
mul val
xchg eax,regmem
mul val
add edx,regmem
ENDM


; Combintorial: nCr = n! / r!(n-r)! ; n >= r
;
; esi = n
; edi = r
;
mov ecx,esi
sub ecx,edi

; edi = MAX(r, (n-r))
cmp edi,ecx
jnc @F
mov edi,ecx
@@:
mov eax,1
xor edx,edx
mov ebx,esi
jmp _5
_4:
MUL64 ebx, ecx
dec ebx
_5: cmp edi,ebx
jc _4
sub esi,edi ; esi = MIN(r, (n-r)) = [0,16]
mov ecx,[FacListDD2 + esi*8 + 4]
shrd eax,edx,cl
shr edx,cl
div [FacListDD2 + esi*8]
; eax = nCr
ret

const SEGMENT
FacListDD2:
; base power ; n! = base * 2^(power)
dd 1, 0 ; 0!
dd 1, 0 ;
dd 1, 1 ;
dd 3, 1 ;
dd 3, 3 ;
dd 3*5, 3 ; 5!
dd 9*5, 4 ;
dd 9*5*7, 4 ;
dd 9*5*7, 7 ;
dd 81*5*7, 7 ;
dd 81*25*7, 8 ; 10!
dd 81*25*7*11, 8 ;
dd 243*25*7*11, 10 ;
dd 243*25*7*11*13, 10 ;
dd 243*25*49*11*13, 11 ;
dd 729*125*49*11*13, 11 ; 15!
dd 729*125*49*11*13, 15 ;
; dd 729*125*49*11*13*17, 15 ;
; dd 6561*125*49*11*13*17, 16 ;(18!)
const ENDS
Edit: There are a couple bugs. :(
Idea is good, though. :)
Edit: Think I might have fixed it for r=0.
Posted on 2002-04-25 22:10:28 by bitRAKE
Very nice, I am impressed. As far as I can tell will calculate any combonation up to 29Cr. After that the results start to get a bit random as the numbers start to get bigger.


I have attached a basic output which I have quickly scratched together.

Cheers for the hand. I still have a long way to go :)
Posted on 2002-04-26 05:56:50 by huh
That is great to hear. I didn't have time to test it. Think I'll write an FPU version and confirm the results.
Posted on 2002-04-26 07:30:37 by bitRAKE
An example of nCr, also written C(n,r):

         6 * 5 * 4 * 3 * 2 * 1   6 * 5 * 4 * 3

C(6,4) = --------------------- = -------------
4 * 3 * 2 * 1 * 2 * 1 1 * 2 * 3 * 4
To get larger results without resorting to floating point, you can alternate between multiplication and division. Note that if you multiply three consecutive integers together, one of them will be a multiple of 3. Similarly, for any consecutive sequence of integers.

nCr = nC(n-r)

As you can figure out, C(6,2) yields the same number and would use less division with the alternating multiply/divide algorithm.

You may also want to try a recursive algorithm based on Pascal's triangle.

nC(0) = 1
nCn = 1
nCr = (n-1)C(r-1) + (n-1)Cr

Or simply calculate each row of Pascal's triangle until you get to row n, and pick out the r'th entry.

(I wonder how long it would take to calculate something like (54321543)C(27123543).)
Posted on 2002-04-26 16:51:03 by tenkey
tenkey, all those solutions sound really slow. :) Your C(6,4) example reduces to C(6,2) - you just didn't get rid of the 3 and 4. The symetry is why the algo above finds the MAX(r,(n-r)) to reduce the iterations of the MUL64 loop. The FPU should be able to do something like ~60Cr. I wonder when we would need (54321543)C(27123543)? It would be fun to try and solve it though. :grin: Roy has a BigInt lib that would work.
Posted on 2002-04-26 17:30:40 by bitRAKE
bitRAKE, I think you underestimate the FP, I found a little program, (written in Visual Basic of all langauges) that can calculate 268C134 (or 268Cr). I would guess a better written asm prog could go even higher.

My goal for the result of the combonation is to use it to calculate binomal distributions/probability's (which are always between 0 and 1) so this could be intergrated into the function, giving a even greater range.
Posted on 2002-04-26 18:56:39 by huh
huh, sorry I should have said, "using the same algo above."
Yes, there are slower algos that have greater flexiblity.

I have over estimated the difficulty of this problem. For very large numbers all that needs to be done is prime factorization and cancel terms to get the resulting integer. This reduces to some very simple code working up to (2^32)C. The result does not always fit into 32 bits, of course.

(2^32 - 1)C(2^31 - 1) = ;) I don't think Mathematica would finish today! (note: number was too large for Mathematica.)

Try this out with my prime sieve! :grin:
CHOOSE MACRO __n__,__r__

; nCr = n!
; ---------- ; n >= r
; r!(n-r)!
;
; nCr = n*(n-1)*...*MAX((n-r),r)
; --------------------------
; 1*2*3*...*MIN((n-r),r)
;
;
;
;
;
; Note: should add/subtract in the following order:
; C(24,5) = (+24-1)+(23-2)+(22-3)+(21-4)+(20-5)
; ...to ensure all products are reduced with minimal stack size
;
push ebp
mov ebp,esp

mov ebx,PrimeBits ; prime sieve!
mov edi,1
mov esi,__n__
push edi ;)

_3: ; over range [F(MAX(r,(n-r))), F(n)]
; store prime factors of ESI on the stack
mov eax, esi

; divide by all primes less than N until = 1
mov ecx, 2
jmp _3a

_31: ; not a product of ECX
; restore EAX to test again
pop eax

; get next prime from ECX
dec ecx ; needed for 2
shr ecx,1
_32: ; test for primeness
bt [ebx],ecx
inc ecx
jnc _32
lea ecx,[ecx*2 + 1] ; prime #

_3a: ; catch final prime or 1
push eax
cmp eax, ecx
jng _3x

xor edx,edx
div ecx

test edx,edx
jne _31

mov [esp],ecx ; save factor
jmp _3a
_3x:

;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

_4: ; over range [F(2), F(MIN(r,n-r))]
; subtract prime factors of EDI at [ESP,EBP]
mov eax, edi

; divide by all primes less than N until = 1
mov ecx, 2
jmp _4a

_41: ; not a product of ECX
; restore EAX to test again
pop eax

; get next prime from ECX
dec ecx ; needed for 2
shr ecx,1
_42: ; test for primeness
bt [ebx],ecx
inc ecx
jnc _42
lea ecx,[ecx*2 + 1] ; prime #

_4a: ; catch final prime or 1 or 0
cmp eax, ecx
jng _4x

xor edx,edx
push eax
div ecx

test edx,edx
jne _41

; find factor on stack and erase
lea edx,[esp+4]
add esp,4
@@: cmp [edx], ecx
lea edx, [edx+4]
jne @B
pop [edx-4]
jmp _4a

_4x: ; delete final prime or 1
mov edx,esp
@@: cmp [edx], eax
lea edx, [edx+4]
jne @B
pop [edx-4]


dec esi
inc edi

cmp edi,__r__
jle _3

; build number from factorization
; on stack [ESP,EBP)

xor edx,edx
mov eax,1

@@: mov ecx,[esp]
xchg eax,ecx
mul edx
; EDX is zero
xchg eax,ecx
mul DWORD PTR [esp]
add edx,ecx
pop ecx
cmp ebp,esp
jne @B
pop ebp
; EAX:EDX is U64 result
ENDM
C(n,r) ; r should be MIN(r,(n-r)) to use above algo.

It's not the fastest for very large numbers, but gets the job done and I can understand how it works. Major slow down is the brute force approach to factoring - easy to plug into other routines. Stack memory is used for convienence.

Often playing the card game Spades, we only have three players (we play partners and one guy has to leave). So, we just throw two jokers in and everyone gets 17 cards. Well, with all those cards 47,153,358,767,970 hands are possible! (54C17 and the algorithm hardly used the stack!)
Posted on 2003-09-01 21:25:03 by bitRAKE
Hi Bitrake,

the Prime"factorization" Method leads to the fastest possible Method to compute the factorial of large numbers. In fact 1.000.000! can be computed in this way in 3.5 seconds on a P4 1.5Ghz. The results have 5.565.709 decimal digits. Now, the "primefactorization" method can be easily expanded to strike out all duplicate primepower exponents if we want the binomial.

As Example
10! = ((3)^2 * 5)^2 * 7 * 2^8 -> shortest computational path for 10!
5! = ((3)^1 * 5)^1 * * 2^3

now for C(10,5) we strike out the exponents by subtractions
C(10,5) = 10!/5!/5!

10! = 3^4 * 5^2 * 7^1 * 2^8
- 5! = 3^1 * 5^1 * 7^0 * 2^3
..... = 3^3 * 5^1 * 7^1 * 2^5
- 5! = 3^1 * 5^1 * 7^0 * 2^3
..... = 3^2 * 7 * 2^2 = 252

thus C(10,5) = 252

Above we get the primepower expansion and strike out by subtraction each exponent.
After this we can reorder each exponent such as above shown
252 = 3^2* 7 * 2^2 = (3*2)^2 * 7

Here we get one Multiplication + one Squaring + one Multiplication. Thats the computational shortest possible path to compute this value. One Squaring is about 1.5 times faster as the multiplication (of course with very large numbers).


Hagen

PS: Arnold Sch?nhage has proposed this way.
Posted on 2003-09-02 05:13:43 by Hagen
Hagen, in the algorithm above I use the method you have explained. I put the prime factors on the stack and it is quite fast in practice. For example, C(10,5):


. = push 1
10 = push 5,2
1 = pop 1
9 = push 3,3
2 = pop 2
8 = push 2,2,2
3 = pop 3
7 = push 7
4 = pop 2,2
6 = push 3,2
5 = pop 5
Which leaves 3,2,7,2,3 on the stack = 252 :)

I don't fully calculate 10! or even accumulate the factors for the lower part (let Arnold Sch?nhage know of my improvement). The algorithm was designed with specific dependancies on x86 ASM and is very fast except the prime decomposition.

Do you know of a faster prime decomposition?
Posted on 2003-09-02 07:59:49 by bitRAKE
Yes I have seen that you do it in this way, but it's harder to understand a mathematicaly problem only based on asm source :)

I wanted to show why we can use primepower extraction and how to use it for factorials, binomials, product and other combinatoric functions.

> Do you know of a faster prime decomposition?

Your way is for small computations even optimal. For larger numbers with use of floatingpoints we don't need any real computation, there exists faily very exact approxiamtions. Such apporximations are very very fast.

For exact computation with very large numbers is the essential trick to find the shortest path with common powers. Eg. we need a way to multiplicate all products very fast. In this context we have to prefer Squaring over Multiplication because Squaring is 1.5 times faster as Multiplications.
After this we have to multiply some Primorials (eg. Lists of many small primes) very fast. The fastest way are called Binary Splitting for this Operations.

As example 50! =
((((3)^2 *
5*7)^2 *
3*5*11)^2 *
3*13*17*19*23)^2 *
13*29*31*37*41*43*47 * 2^47

As we seen we have some Product liek 3*13*17*19*23 to build. Here we use binary splitting by ((3*13) * 17) * (19 * 23). Eg. a weighted multiplications.

If we consider all this need we see that a Power Table, a array of Prime + Exponent are a very good idea.
So we have to build such a table based on N of N!.


here some small pseudo code:

procedure Powers(Table, N: Integer);
{
I = 0
while true do
{
P = Primes
if P > N then break
I = I +1
E = 0
K = N
while K > P do {K = K div P, E = E + K }
if E > 0 then { Table.Base = P, Table.Exponent = E }
}

}

Above we extract the prime power exponents only for N, but we could extract it in one step as example to N/K/L where we have (N! / K! / L!). In this case we have for each prime to compute the exponents to N,K,L in one loop and the subtract these exponents. The final power table could be that one we need to compute the binomial.

By the way, the power to prime 2 can be computed easier.

C(10,3) = 10! / 7! / 3! thus N = 10, K = 7, L = 3 and

(N - L - K) - (BitWeight(N) - BitWeight(L) - BitWeight(K)) = Count of trailing 0 LSBits in the Binomial of C(10, 3).

Thus N - BitWeight(N) = count of trailing 0 bits in N!.

BitWeight(N) is the Count of 1 Bits in N, hamming weight.

After we get this table we have to examine how are the shortest path with a maximum of Squarings.
The trick is fairly easy:

we have now our Powertable such as follow

3^56 * 5^34 * 7^15 * 11^7 * 13^3 * 17^2 * 19^1 * 23^1 * 29^1

an we want

such as (((3*5)^2 * 7)^2 * 11 * 13)^2 .... (only example not math. correct:)

To get this we write down to each prime its power as binary value

3^001111000
5^000110011
7^000011010
...

and so on

now each Bitcolumn from left to right are computed where the column contains 1.

From above we get.

3
3*5
3*5*7
3*7
5

->>

((((3)^2 * (3*5))^2 * (3*5*7))^2 * (3*7))^2 * 5

Thats the shortest path. In fact, when you know how works Exponentations Algorithms = Binary Powering then You can see the direct dependencies to above way. It's nothing others as a binary Powering Algorithms to multiple Bases and Exponents, like A = B^C * D^E * F^G.

For any interesst I attached a small executable with the Pascal/Delphi Source to compute Factorials/Binomials/Permutations/Products/Primorials to arbitrary numbers.
In the source you can find some quit different Factorial Computations Algorithms.


Regards, Hagen

PS: i hope my bad english is'nt to bad :)
Posted on 2003-09-02 12:55:28 by Hagen
I think this is same as Hagen's function, in C:
long contdiv(long x ,long i){

long tot;
do{
x=floor(x / i);
tot+=x;
}while (x>=1);
return tot;
}


x is number under check of factor numbers of prime i
That is called contdiv (maybe an other term but translated word by word it is 'continuous division')



main(){
long i,chknum; char *k;
do{
k=strcat(k, " * ");
k=strcat(k, prs[i]);
k=strcat(k, "^");
k=strcat(k, contdiv(chknum,prs[i]));
}while prs[i]<chknum
printf (k);
}

prs is prime list.
The function does not calculates factorial but prints as above
10! : 2^8 * 3^4 * 5^2 * 7
100! : 2^97 * 3^48 * 5^24 * 7^16 * 11^9 * 13^7 * 17^5 * 19^5 * 23^4 * 29^3 * 31^3 * 37^2 * 41^2 * 43^2 * 47^2 * 53 * 53 * 59 * 61 * 67 * 71 * 73 * 79 * 83 * 89 * 97
.
.
I have written this code when we learned how to find how many digits does a factorial has without calculating itself and with out logarithms at school. Algorithm does not seems fast but does the job;)
Posted on 2003-09-02 14:54:47 by inFinie
Yes thats the right code translation if my C knownledge is correct :)

> Algorithm does not seems fast but does the job

Thats dependend on the viewpoint. As i currently can say it is the fastest to compute the factorial/binomial etc. to arbitrary size. There exists currently no other known method with same low complexity. The small 32bit Divisions and the needed memory for the prime power table can be ignored, in comparsion to the needed squarings and multiplications.

Some people proposed and stated that there exists a FFT based factorial function with far lower complexity. But I never found it :(

My attached executable is as i know that fastest PC program to compute such factorials :)

Regards Hagen
Posted on 2003-09-02 17:40:40 by Hagen
Thank you Hagen and inFinie. I am sorry my math skills are lacking - my asm is better. I'll review the code (I can read C/Pascal equally well) and convert it to asm if appropriate - for comparison. Hopefully, I can learn more about math through my coding. :)
Posted on 2003-09-02 19:34:00 by bitRAKE
http://mathworld.wolfram.com/Factorial.html
n!=gamma(n+1)

Here is a source to approximate gamma function in TASM
http://www.crbond.com/download/misc/gamma32.asm
Posted on 2003-09-03 04:18:26 by inFinie