Hi Homer,

and suggest that my oopified translation of the core code be a good place to start.

Do you mean the NNet project that comes with the ObjAsm32 package?

Friendly regards,
Posted on 2006-12-30 11:54:15 by mdevries
The NNet class object and the associated XOR test/demo were written by me, based strongly on previous work by Thomas Bleek.
You can take the NNet class object and use it to create a larger network, more similar to Thomas' DigiBrain demo .. in that demo, he's associating one neuron with each pixel on a (64x64?) grid, and using that grid to recognize 'mouse gestures' as characters from the english alphabet.

Of course, for chess, we want more than just 'on/off' for each grid cell.
In order to use such a neural network to recognize chess patterns, we have to create a smaller grid (the chessboard) with n neurons associated with each cell, where n = 16 (the number of different pieces in the game).
The best way of teaching this neural network, especially when it is an 'infant', is to allow it to watch real games being played between two humans..
We create or load our NN, and then we clone it so we have two copies.. each copy will monitor one player.. (we'll see why at the end)..
At the beginning of Turn A, we show the current state of chessboard to the NN.
At the end of Turn A (start of Turn B), we teach the NN that the correct output for gamestate A is gamestate B.
We execute a loop of training cycles, showing the NN state A until it quits returning junk and returns B the way it should.
Then we repeat the above until the game has been Won.
When the game is Won, we discard the NN of the player who lost, and we save the winner NN, thus our winner NN becomes a little smarter having remembered the moves which led to winning yet another game. By using separate NNs, we have avoided inadvertently making our NN 'more stupid' - we don't care to remember how to lose.

When we've trained our NN for a while, we can try playing against the computer by switching the NN from 'Training mode' to 'Run mode'.. when its the computer's turn, we show the chessboard to the NN, and then set the chessboard to whatever the NN outputs are.. at first, the computer will be a poor adversary, so we can save the NN state to a datafile and call it a "difficulty level", then we can train our NN some more with real games, and repeat all this process, making several "difficulty levels" until the NN is pretty much unbeatable :)

The way Thomas' handwriting recognizer works is very similar to the above - just less complex - but the idea is still that we monitor the grid, feeding its whole state to the NN, and have the NN remember the states that led to a Character being written, in order to recognize when a character IS being written.
Posted on 2006-12-30 21:54:36 by Homer
Found a couple more (small) bugs in the Portals code.
I'm now able to correctly classify and split the large rectangular Portals (ie fragment them), so I'm not too far away from completing the next stage.. probably be posting an update today.

I've moved a little code around, extended the functionality of a couple of procs etc, but really haven't added any more code - I see no point adding new code until the existing code is up to scratch.

This bsptree generator project is kinda cool when compared to the others that are floating around .. from the testing I've done so far it seems to handle models with 'bad polygons', and/or models which are 'not closed'.
The first problem - triangles which are very small, or which have one very short edge, I seem to have gotten around this issue by A) using an 'epsilon' value to give my planes some tolerance (thickness), and B) using a specially modified intersection detection and calculation which is more numerically stable (when the input values are extremely large or small) than those algorithms I have seen posted on the net.
I'm quite proud of that IntersectionRayPlane function.. sure its not quite as fast as the popular ratio-based solution, but its very accurate, it isnt screwed up by extreme values, and it returns not only the intersection point, but also the distance to the intersection (along the edge in question) in the form of a Magnitude scalar - which is VERY useful for calculating the texture coordinate for the new vertex formed at such an intersection.
The only problem this function has is that it doesn't TEST for an intersection per se - it ASSUMES one exists, and merely calculates where it is .. the test for an intersection of an edge and a plane is actually IMPLIED by classifying each triangle's 3 vertices against the plane.. if we find two vertices on different sides of the plane, there MUST be an intersection.
Posted on 2007-01-01 12:39:32 by Homer
I've just been executing a test run of the Portal fragmentation code.
Oh My God..

I thought that generating the BSPTree was computationally expensive.
As previously mentioned, on my 500mhz development box, the test input file (Tiny.x) takes about 12 minutes to generate the BSPTree.
Now we have a crapload of 'BSPNodes', each of which 'owns a Plane', and we've built a large rectangular polygon at each of these planes.
I've been watching my application 'fragment' the FIRST of these rectangles for well over half an hour, and we're sitting around the 300k triangles range... at this rate, we'll complete the fragmentation pass by June.. I think I'm going to be forced to add some code to try to further reduce 'splits' by eliminating 'bad triangles' , but this is a two-edged sword .. we're reducing the overall amount of computation, but we're making each individual computation take longer - how do we calculate the 'break even' point?

I've attached a binary exe - please test and let me know roughly how long the Portal fragmentation takes on a faster box :P

Posted on 2007-01-02 00:52:33 by Homer
I know you're not too keen about looking at "prior art" :), but while it took a long time to run back on my 100MHz 486, the quake BSP/vis/light cycle was in the days rather than months range ;)

I'll grab the DX9 SDK and look for tiny.x...
Posted on 2007-01-02 01:53:17 by f0dder
At this point, my code has a serious problem.
I'm generating hundreds of thousands of faces during the fragmenting of just ONE of my potential portal quads.
I'm not sure how many extra vertices that implies, but it has to be more than the 64k limit imposed by the word-sized vertex indices in my TriFace structure.

I have to make those indices 32 bits.
I can't use Pointers there, because the vertexbuffer is dynamically reallocated every time a new vertex is added - this means that pointers are frequently made redundant, and so we can't use them.

Using indices and a growable vb is still preferable to a maximal array imho.. after all, eventually the MAJORITY of the faces we're generating will be destroyed, along with their vertices, and we have NO way of predicting the resource requirements up front.

Posted on 2007-01-02 02:09:29 by Homer
Perhaps a different data structure would be in order?

Linked-list is probably way too much overhead, but what about "linked chunks" (like C++ deque is usually implemented with afaik)? If you only ever need to grow (and don't need to move things around etc), it should be pretty trivial to implement, and looking up an entry from a 32bit index would also be easy.
Posted on 2007-01-02 02:13:26 by f0dder
Quake levels typically contain MANY coplanar faces - ie, many faces sharing few planes.. people who design Quake levels were always very careful to make sure that they took the BSP process into account WHILE creating their geometry - for example, inserting extra faces whose sole purpose is to form a nice splitting plane.
They took this concept further in Quake 2, with the addition of 'hint brushes' - these are nothing more than a few carefully-selected extra splitting planes (which don't necessarily contain any faces).
Quake 3's level editor takes this even further, by defining simple convex objects as simple bsptrees which can be combined to create a more complex and yet optimal bsptree with very little processing required.

I did mention that I'd chosen a particularly poor input file, and I'm not shocked at the horrible results.. I'm yet to feed a quake-style world to my generator (with lots of nice orthogonals etc), but I expect my generator to outperform theirs in both worst and best case scenarios, I'm not kidding.

Posted on 2007-01-02 02:19:03 by Homer
Yes, my memory access could be streamlined more - linkedlists are not going to make this code any more efficient, since our lists tend to be linear, but reallocating for array growth could be done in chunks rather than on a per-object basis.
I don't think there's any way I could really improve on the data structures involved, but their accessing could certainly do with some spit and polish.
Posted on 2007-01-02 02:23:58 by Homer
Side note: christ the DX9 SDK is packaged lamely... self-extracting file with a single self-extracting file in it. At least the second file has files directly in it, instead of setup+cabs :)

So, how do I get an estimate on how long the generation would take? And does the algorithm + data structures lend itself to multithreading, or is the next cycle dependant on the previous?
Posted on 2007-01-02 02:37:40 by f0dder
The initial construction of the BSPTree is a recursive process which takes one input list, sorts it into potentially three output lists, stores potentiall two of those into child nodes, lather, rinse and repeat.

A child node can't be processed until its parent has been (obviously), but in cases where we have created two childs, we could in theory spawn a new thread to process the 'other' branch, and have our processing threads just kinda die when they run out of work to do rather than returning from recursion.

Two problems I see here : firstly the existing code is designed to be recursive, and we'd have to alter that behaviour - not a huge problem, most difficulty there is that I've spent months training myself to think in recursive terms with regards to this algorithm.

Second problem I see is that we'd have to introduce mutexes on all our arrays (and some of our currently global variables such as counters) eg mutexed access functions, in order to prevent our threads from screwing each other up when they want to add new faces etc - effectively I think we'd end up running sequentially, only  with lots of context switches and mutex checking added.. it sounds slower to me in theory than a single thread !!

I've wrapped the first stage code (tree generation) in its own thread, just so it doesnt lock the gui, which does contain a timer to tell how long that stage took - but the second stage code (portal generation) is currently executing from the main process thread - so the gui locks up due to MessagePump not getting much of a fair go.. and the timer isn't being used in stage 2, so you'll have to time it yourself (sorry!)

Posted on 2007-01-02 02:54:35 by Homer
Sounds like multithreading it is too much bother; simple counter variables don't need to be mutexed, they can be incremented atomically, but it does sound like the algorithm doesn't lend itself to (simple) threading. Haven't looked into the BSP algo, but I sorta expected this.

BSP tree generation took 1:42, let's see how long portalling takes :)

Posted on 2007-01-02 03:01:29 by f0dder
Replacing the word sized vertex indices in my TriFace struct with dword sized indices has yielded a speedup of maybe 25% :D

Also, far fewer portal fragments are being generated now - I guess I was right, I'd 'clocked' my indices and was referring to the wrong vertices :P

Attached binary :)
Posted on 2007-01-02 03:08:43 by Homer
Ok, I've killed off portalling and will redo then. Sounds like a decent speed increase, nasty old 16bit words foo :). New code took 1:41 to treebuild, so I assume the speedup was just for the portalling.

Btw, in the portalling stage, if the window is obscured, it'll stop repainting.
EDIT: the portalling stage seems to spend a sizable amount of time in kernel mode (about 50% in periods) - I assume this is memory realloc, so you could probably gain some significant speed by changing the memory layout and routines...

EDIT #2: oh great, now it crashed after obscuring the window, after running for 12 minutes. And I accidentally closed the JIT popup instead of noting the crash address etc. >_<
Posted on 2007-01-02 03:13:50 by f0dder
Thanks for the feedback so far :)

Yeah mine crashed too - although I didn't wait for it, I went and had a couple of beers at a friends place..

When I got back I saw that the first portal had completed fragmentation - heres' the last two lines from my debug spew:

.DwordCollection.dCount = 57915t, #Fragments after Fragmenting
Frogmarching fragments

When each big rectangle has been completely fragmented into a zillion shards, we use a function called FrogmarchPortalList whose job is to feed the fragments into the root node of the tree, and then to recursively filter them down the tree into the leaves.
I do this immediately after each big rect has been shattered, so that we don't have to worry about overwriting fragment lists in any lower nodes - ie, topdown recursion, with a nested topdown recursion :P

The frogmarch function seems to be the culprit, I'll look into it.
(I'm switching to a smaller and more simple xfile until this is fixed)
Posted on 2007-01-02 06:16:29 by Homer
Hey frog!

Yeah you!
Hey frog froggy frog frog frog; All of my fears came true!
Black and blue and crashing code you left me here I?m all alone...

I'm sure somebody will eventually catch the reference :)
Posted on 2007-01-02 06:21:32 by f0dder
Bad news..

When I tried a smaller xfile, I got further :|
I tried tiger.x, and I found I was able to fragment and frogmarch TWO big planar rects, it'd crash on the third.
Looking into the code to see where we landed, I found we landed in the DivideFaces proc, which is tested and working, so I'm guessing this is a thread stack overflow problem - and since we're using the main process thread at this stage, it means I need to change the default process stack in my linker options - can you remember that switch?

I'm going to go out on a limb here and suggest this is NOT simply a case of me screwing up the stack, it's due to the recursion depth and the number of stackframes that implies.
Posted on 2007-01-02 10:15:13 by Homer
link /stack:reserve,commit other options - easy peasy :). Check the value of ESP and compare to esp-at-program-entry (NT doesn't randomize memory space layout, so this should be safe enough for a quick test like this). Also, keep in mind that large stack allocations need touching to avoid pagefaults.

(Haven't looked at your code, throwing guesses).
Posted on 2007-01-02 17:22:52 by f0dder
I'll try relinking with a larger stack first, maybe doubling the default stack from 2 pages to 4 pages for starters (keeping it page aligned).
I'm feeling confidant that this will solve the problem, but really the problem is of my own making - you see, I have a nested recursion of the entire tree, so we're recursing a lot deeper than it might appear.
One layer of that recursion is mandatory, but the other is absolutely unnecessary and the offending code could be linearized.
Hell, I'll probably do BOTH.

It's stinking hot right now, I'll wait until the cool change blows in from the sea, right now I'm going to walk down to the beach and make lude comments at passing women while sipping on an ice cold beer or three :)

Posted on 2007-01-02 23:24:50 by Homer
Increasing the stack size by a factor of 2 got me exactly twice as much processing before it fell over.

I spent this morning adding debug code to check the values of esp either side of certain procedure calls, and found no issues.

I disabled the Frogmarch proc, et voila - no more issue.
Looking closely at that proc I still see no problem.
Guess I'll spend some time rethinking my strategy in regards to reducing if not eliminating the nested recursion represented by that proc.
I have all day to code and debug :)

Posted on 2007-01-03 17:19:31 by Homer