imul reg0,reg1
adc reg0,0
imul reg2,reg3
adc reg2,0
imul reg0,reg2
adc reg0,0
je ZeroFound
Posted on 2002-07-27 03:37:47 by Nexo
Assuming that your variables are an array of bools of 1 bit of size per element, you can test 32 of them at once (64 using MMX (128 using SSE (256 using something else :grin: ))) and detect if at least one of them was at 0, with this trivial code:

NOT reg
TEST reg,reg ; the stupid IA32 architecture doesn't update CPU flags on NOT and NEG
JNZ label
Posted on 2002-07-27 04:14:25 by Maverick
Stupid me, I sleep too little :grin:

This is more optimized:

CMP reg,-1
JNE label
Posted on 2002-07-27 04:32:19 by Maverick
The stupid IA32 architecture update CPU flags on NEG.
Optimized equivalent NOT with update flags is XOR reg,-1.
Optimized equivalent CMP reg,-1/JNE is INC reg/JNE.
Posted on 2002-07-27 11:27:26 by Nexo
I will try to do it with MMX, it will be a good opportunity to learn how it works. Thanks again.
Posted on 2002-07-27 14:29:40 by Dr. Manhattan
i thought about usign the nearest way:

or a, b
or a, c
or a, d
test a, a
jz all_are_zero
// code for your a||b||c||d-condition

i think its about 14 bytes. is that way that bad so nobody posted it?!
Posted on 2002-07-27 14:54:05 by hartyl
hartyl, what Dr. Manhattan really wants is: if (a=0) or (b=0) or (c=0) or (d=0).

Dr. Manhattan, MMX will be slower due to having to move the comparisons into a 32-bit register and test again. The method I posted will test all possiblities for the whole board in less than 40 cycles on an AMD Athlon (with there being no connect fours). I have tested it - 40 cycles for the whole board seems good, no?
Posted on 2002-07-27 19:25:00 by bitRAKE

The stupid IA32 architecture update CPU flags on NEG.
Optimized equivalent NOT with update flags is XOR reg,-1.
Optimized equivalent CMP reg,-1/JNE is INC reg/JNE.

Hey Nexo, did that offend you or something? ;)

Yes, on X86 NEG updates the flags.. as I wrote I sleep too little. :grin:

INC reg is destructive, anyway.
Posted on 2002-07-28 04:14:49 by Maverick

Hey Nexo, did that offend you or something? ;)

No, I am also more like Motorola CPU ;)
Posted on 2002-07-28 07:31:31 by Nexo
Hi Nexo,
That is my major source of confusion, I still code much (free time permitting, it's not job) on the 68060 today.
I was in a hurry and produced those 3 ugly lines of x86 code, then immediately I did regret and did the CMP version. I shouldn't talk bad about the x86, at least here, but after some linear (meant as instruction set and registers) coding on the 680x0 I was a bit confused on the x86. ;)

Thanks for the correction, BTW. :)
Posted on 2002-07-28 10:07:36 by Maverick
in c++ (a || b) equeals (a or b)! but ok!
(a==0) || (b==0) || (c==0) || (d==0) = !a || !b || !c || !d - this is a boolean expresion, which sould be written like this:
!a v !b v !c v !d = (now we use the boolean rules) =
! (a & b & c & d), which we should translate like this:

and a, b
and a, c
and a, d
test a, a
jnz all_are_non_zero
; code for (!a v !b v !c v !d)

i think this should work too
Posted on 2002-07-28 14:27:41 by hartyl
hartyl, you're almost up to speed on the conversation. :tongue: Posted on 2002-07-28 14:50:08 by bitRAKE
bitRake, how do you do to test a possibility that needs two registers ? For example the diagonal a3- d6 has 3 bits in eax and 1 bit in edx.
Posted on 2002-07-28 17:42:56 by Dr. Manhattan
Dr. Manhattan, if you line the array up linearly we have 42 bits - EAX holds bits 0 thru 31, and EDX holds bits 10 thru 41. You will find that there are only two diagonals not completely in one register. These two test are put at the end and by destroying the values in EAX/EDX branching only occurs when there is a connect four. Three registers could be used to cover the whole board. Simply:
mov eax,P1_Board

mov edx,P1_Board + 1
mov ecx,P1_Board + 2
My test also show the same or better timing for direct memory tests - once the board data is in L1 cache. This is a strange result - I thought memory tests would be slower? See the attached picture for the two diagonals:
Posted on 2002-07-28 18:24:04 by bitRAKE
bitRAKE, you test is slow. For all 7x6 field you need make maximum 69 tests (plus double TEST). Also you miss - each square is 2 bit (it also plus some TESTs). Obviously, what optimum maximum tests must be not greater then 7x6. Each TEST must check only one square. And travel through binary tree (for optimization branches may be united). The square for TEST must selected by depending on more probable placement 4. The start square for TEST is center of field.
Posted on 2002-07-29 11:27:30 by Nexo
Nexo, I cheat. :) If I don't like the format of the data I change it - each square is one bit (see this post) - only need to test for one side each turn. I don't transverse tree and I only test whole board for one player. Tree is a good idea, but how would you avoid branching? The method I propose does not branch unless there is a connect four - which means no branch in most common case.
Posted on 2002-07-29 12:16:20 by bitRAKE
Yes! You understand my idea. If one square - 1 bit. One TEST will check one square (even 2 bits it no problem). And branching for more probable connected four on both case (hit or miss). While not hit for connected four or miss for all combinations connected four on plaing field. Even if connect four not found, may be not all square checked. Because tree will "known" what squares needs for check in present and what squares not needs. Tree method is more scalable for another field sizes. Always still one TEST. And maximum executed TESTs = size of field. I think here more intresting consider problem of building this tree ;) That equivalented code generation of TESTs sequences. How much size of this tree? How build this tree? A good questions :grin:
Posted on 2002-07-29 13:44:37 by Nexo
In this case branch miss is too great - better to test all 69 cases. The general problem is much more interesting though! :) I will think about code to create tree code and how to minimize branching given a general board.
Posted on 2002-07-29 14:02:46 by bitRAKE
Example. Field 3x2. Search connect two squares.
Different field combination = 6+15=21
bitRAKE algo average tests = (1+2+3+4+5+6+7+8+9+10+11+11*10)/21=8,38 tests.
On tree (all paths) average tests = (6+5+5+6+6+5+2+3+4+5+6+3+4+6+5+4+6+5+6+5+6)/21=4,9 tests.
For more big field difference of averages will be greater.
May be little errors :) Midnight. I am want sleep....
Posted on 2002-07-29 15:04:53 by Nexo
Nexo, I do not understand the math. :)
The goal of the game is connect four and each player prevents the other from doing this - this is not very likely to happen (I have played this game as a boy, I know :)). Also, in the game the bottom of the board fills up first as the pieces go in the top. So, the probabilities are not as even as the math makes it seem! Additionally, you add no cost penalty for bad jumps, and we know this is costly (hence the original question). Maybe, I will try to build a set of macros that will allow us to test different bad jump costs and board sizes - this will be useful for other problems as well. ;)
Posted on 2002-07-29 16:50:27 by bitRAKE