The attached demo code imports geometry from an XFile (collapses mesh hierarchy but preserves materials and FVF etc), and generates an MX Octree from it.. blisteringly fast.. fast enough for realtime, and  typically generates FAR fewer 'splits' than bsp does.

All the democode I had seen for MX octrees involved distributing POINTS in space, not FACES - I reworked a bit of my bsptree code to cut faces against planes where necessary.

Right now you are probably thinking "I know what an octree is - but what's an MX Octree, and how does it differ from a conventional one?"

Conventional octrees define a boundingbox around the entire set of input faces, and then chop that BBox into 8 child nodes (known as 'octants').. the input set of faces is then inserted into the 8 octants by testing the faces against the (smaller) octant boxes.

MX octrees define the implicit origin of the BBox, rather than its explicit corners - imagine a 3D crosshair in the middle of the input set of faces, now test each face against that crosshair - we're performing "point-point tests" instead of "box-poly tests", and consequently its MUCH faster.

MX octrees are more suited to situations where the faces are not evenly distributed in worldspace, ie there are large regions containing relatively few polygons, and smaller regions which are more densely populated.. They are just great for outdoor scenes.

The result is almost IDENTICAL to the output of a conventional octree generator, it's just a LOT faster to produce.

This demo has the input xfile's name hardcoded into it, that's because this code is brand spanking new .. I'll spend a couple of days importing my 'Frustum Culling' code into it, and then we'll be able to begin taking advantage of the octree's structure, which is in essence a hierarchy of boundingboxes .. if a given node's BBox intersects the camera frustum, then it and its children are potentially visible .. but if a given node's BBox does NOT intersect the frustum, then that node, and any children, are not visible, and don't need to be rendered :)

I have already identified a couple of ways to speed up this code even more, but please feel free to comment, make suggestions, criticisms, etc.
Also note that as usual, this code is unlicensed, feel free to steal my source and use it in any way you see fit.
Posted on 2007-01-14 23:54:17 by Homer
The reason that I used tiger.x in this demo is that this model is rather small - the values of its vertices are very small numbers.
When we start chopping the model up into octants, those numbers very quickly become very much smaller - and here lies the greatest danger for any spatial subdivision implementation - when numbers are very large or very small, they are highly sensitive to numerical drift.
When I tried reducing the maximum allowable number of faces per node to a value below 100, things start to go wrong.
The major problem I identified today involves the code which handles those faces found to be coplanar with a splitting plane - when the numbers get too small, the reclassification fails.
I replaced that code with the following (at line 47 in SplitTriangleWithPlane)

;Compare Face Normal to Plane Normal
;How different are the angles?
lea eax,vtemp
Vec3Dot eax,pFace
fstp vtemp.x
fMin vtemp.x,fEpsilon ;if less than epsilon, we'll consider it zero
fstpReg eax
.if eax==vtemp.x ;if its zero (or close enough), return Front
  return FRONT
return BACK

Also, I added a couple of lines of code to the BuildOctreeNode function so that we stop recursing when fHalfRadius is less than fEpsilon. Without that, all hell breaks loose since our octants are actually smaller than the tolerance (thickness) of our planes.

This kind of logic must also be applied to faces themselves.
When a face is split, I should check if the output fragments are 'legal' either by checking their Area against some minimum value, or by checking their edge lengths against fEpsilon.
If we find that a split has produced triangles which fail such a safety check, we should discard our split triangles in favour of their parent face (which we split), and place the entire parent on BOTH sides of the plane.

These actions will prevent the creation of triangles which would be problematic in regards to collision detection and any other tests that are prone to numerical error, and since we put offending faces in BOTH neighbour octants, they're available for testing with respect to both the nodes they share.

Do you agree that the above statements seem sound?
What do you think?
Posted on 2007-01-15 08:00:33 by Homer
I'm considering adding a little more code that would effectively allow this MX octree to be 'adaptive'. We allow the node origins (those imaginary crosshairs) to be repositioned slightly for the purpose of avoiding splitting of faces.
In order to achieve this feat, my solution would involve creating three lists of 'axial span values' which represent the axial ranges covered by the input faces (kinda like a spanbuffer).
These three lists would require sorting, and then from the sorted list we can determine where along that axis we ought to place our axial splitting plane.. do this for all three major axes, and we have our new origin. The cost might seem high - the input faces are sorted a total of 9 times per output node - but with a little effort these lists can be sorted just once and maintained throughout the treegen process.
Regardless, they're still cheaper than the popular method used to choose a splittingplane in a bsptree gen (comparing every face against the plane of every other face).

Your thoughts?
Posted on 2007-01-15 17:24:31 by Homer
Fixed a small bug in the coplanar handling code I posted the other day..still haven't implemented any proposed changes (such as split avoidance).

I created a small demo 'terrain' mesh, roughly 3300 vertices.
When I generate an octree from this, with a faces-per-node limit of around 100, I end up with around 20k faces (didn't check how deep the tree was, but it's quite shallow, something like 4).

Does this seem like a reasonable result?

What is a reasonable max. number of faces per node?
I think 100 is probably a bit overzealous for a 'real game level' where we might have something like 100k faces BEFORE treating them to the treebuilding/facesplitting algorithm..
Anything less than 50 creates quite a deep tree, and MANY more split faces, which makes sense to me, and is quite unreasonable.

I can't find anything on google to benchmark against :(

I just tried increasing the test model's resolution to 8192 faces, and again used a limit of max. 100 faces per node, this time I generated 29k faces at about the same tree depth.. looks like the EXISTING algorithm scales well to larger inputs :)

Posted on 2007-01-18 23:24:13 by Homer
What's really sad, for me, is, not too many years ago, you and I could have had a down and dirty mixture of good discussion and code swapping cause I was so into this stuff.  You just came around maybe 10 years too late for me, Homer.
Posted on 2007-01-19 00:17:46 by drhowarddrfine
I've just been reading a paper which talks about using a 'covariance matrix' to calculate the best ARBITRARY splitter plane by analyzing the distribution of the input data.
This, in combination with a KDTree algorithm, would beat the hell out of adaptive mx octree, yet still retain all the benefits of a BSP.
However I have not found any actual implementation of it, so it's probably pie in the sky.
Posted on 2007-01-19 00:29:18 by Homer
Just added a new function called "IsLegalFace"..
Whenever we create 'fragment' faces (by splitting a input face), we check the lengths of the three edges of those fragments to make sure that the length is larger than fEpsilon.
Triangles with edges smaller than fEpsilon are considered 'illegal'.

I have found that yes, I am generating illegal geometry.
Since I haven't applied this check to the input set of faces themselves, I can't yet tell if I'm actually IMPORTING some illgal faces, but it sure would seem to be the case, because I'm seeing fragments with one edge of ZERO LENGTH (two vertices are exactly the same values).
That would not suprise me at all since I used MAYA to generate the test model, and it probably contains 'dummy triangles' to facilitate 'stripification' for accelerated rendering purposes.

Later today I'll be adding code to check for illegal faces within the Xfile Importer function, and simply throwing away any faces deemed to be illegal during that phase.
Having done that, if I STILL see illegal faces being generated, then I'll be re-evaluating my proposed 'split avoidance' mechanism in order to deal with these nasty little buggers.

Posted on 2007-01-19 01:34:58 by Homer
I've attached an update.
The hardcoded name of file being tested is 'test.x"
If this version sees an illegal face being imported from the demo xfile, it will issue a debug warning, then crap out and die unceremoniously.
If no illegal faces are imported, it will try to generate mx octree from imported data as usual.
If any illegal faces are created during octree generation, it similarly complains and terminates.. no apologies, no excuses.
I'm determined to find the best way to deal with these 'chips', and a little worried that maybe somewhere my tri-splitter functions are broken and are causing them, I need to add more debugging I guess.

I believe I can 'predict' chips before they are generated.
In fact I've got two different algo variants in mind.
Gotta think them both out.

Posted on 2007-01-19 04:29:24 by Homer
I haven't implemented 'illegal chip prediction" yet..
Instead, I have implemented "illegal chip handling" via a new case of "SPLIT_NOT_POSSIBLE"..
When I detect a that triangle is unable to be split against a plane (because it would generate illegal chips),  I simply leave it hanging around in the input collection, so that it's 'been stored in the first node that it totally fits within".
Now, when the tree generation is completed, faces may be found at ANY depth, not just in the leaves...
-Faces in non-leaf nodes are the ones that we decided it was a bad idea to split since doing so would generate illegal chips..
-Faces in leafnodes have been clipped to the node' splittingplanes.

The result is a bit slower, but look on the bright side.. all the faces we did generate are guaranteed to not be "mutants", and the number of new faces and new vertices are a good deal lower :)

I'm being quite overzealous about my error detection now I think :P
I'll post an update tomorrow when I've had time to test properly.

Posted on 2007-01-19 09:37:22 by Homer
The attached update contains the changes for identifying and eliminating 'illegal chips' by leaving offending faces in the parent nodes.
Recursion terminates when all octants (child nodes) contain fewer than 100 polygons, or when the nodesize falls below fEpsilon (our planar tolerance or 'thickness').

My personal benchmarks are as follows:

Without IllegalChip Handling
2k input faces = 20k sorted faces
8k input faces = 29k sorted faces

With IllegalChip Handling
2k input faces = 16k sorted faces
8k input faces = 22k sorted faces

Cleaned up the code a little - removed unnecessary face validations.
Posted on 2007-01-19 17:59:27 by Homer
I've updated the previous attachment - there was a rather serious memory leak which has been fixed.
Since my face-splitting code is itself recursive in nature, I must release any fragments which are 'subsequently split'. I was releasing input faces that got split, but not the intermediate fragments that got re-split.
Posted on 2007-01-19 22:38:34 by Homer
I've written some code to save the generated octree to a custom datafile, and some code to recreate the tree from a datafile.
The file format is fairly compact (for example, the 8 octant pointers for a node are encoded as a single byte) but could certainly be a lot smaller than it is.. not really critical at the moment.

I should be able to get some simple render code added next, which renders the reloaded octree data (probably using sluggish DrawPrimitiveUP). We'll worry about render optimizing soon enough.

Update coming soon !!
Posted on 2007-01-20 02:09:02 by Homer
Found some time this afternoon to add code for 'fast frustum extraction via matrix deformation of a unit cube'.

The attached update calculates the frustum, and has been set up to recalculate it whenever the camera view changes (via a boolean flag)..  since this example currently contains no code for 'moving' or 'rotating' the camera view, the frustum never requires recalculation, and so it doesn't occur.

The next update will probably contain some crude camera controls and 'something' being rendered (so we can test the camera controls).
After that I'll implement frustum culling apon our 'something'.

Posted on 2007-01-27 10:00:43 by Homer
Sorry about the delay..two small bugs, much pain later..

I discovered a small problem in the function used for extracting the View Frustum's 6 planes from the current camera view/proj matrices.
The problem was that the fDistance values in those Planes were actually negative (ie -fDistance).

My test rendering code didn't seem to work, so I decided to split the project : the code previously posted ('Project A') is now considered to be the 'octree generator tool' - it takes an input XFile, and spits out an Octree saved in a custom file format.
The attachment ('Project B') contains some VERY crude camera code, the 'bugfixed' Frustum code, some code for loading the custom file (not yet being used), some proven working demo render code, and enough code to test arbitrary points against the frustum (ie, is point P inside or outside the current view frustum).
I've rigorously tested the Frustum code, and I'm satisfied that its fine, so long as your 'near plane distance' is greater than zero.
If you use zero for the near plane, then the Frustum isn't really a deformed cube anymore, its a pointy pyramid, with the near four vertices collapsed into a singularity at the Eye Position... d'oh.

Next I'll add code to test whether arbitrary Nodes of the Octree are inside or outside the frustum, and we'll begin thinking about rendering of node content.

Posted on 2007-02-02 01:35:03 by Homer

Got your thinking hat on?

The next task is to devise the fastest possible test(s) for Octree visibility. First, let's talk about the 'mathematical tools' we have in our arsenal which are potentially useful in regards to this problem.

A) Point/Octant testing via ClassifyPointOctant
This function currently resides in 'Project A'.. it's really fast :)

B) Sphere/Octant testing via modified ClassifyPointOctant
With a little work, ClassifyPointOctant could be modified to return the 'shortest Axial Distance to CuttingPlane', which can then be compared to a sphere radius value.. that'll make more sense shortly.
The modified version would also be quite fast.

C) Point/Plane testing via ClassifyPointPlane
This is a little expensive as it needs a DotProduct operation.

D) Sphere/Plane testing via modified ClassifyPointPlane
The ClassifyPointPlane function can not only tell us which 'side' or 'halfspace' of a Plane contains a given Point, it also calculates the actual distance from the Point to the Plane along the Normal of that plane (which is always the SHORTEST DISTANCE). The version I've provided so far can EASILY be modified to return that distance value.
We merely need to check that Distance is >= fSphereRadius... if the distance is less than the Radius, then the Sphere intersects the Plane.

Now we have a realistic looking set of utility functions, we can begin to devise our algorithm. Since point-based tests are cheap, I'm going to suggest immediately that we wrap every Octant in a theoretical BoundingSphere, ignoring the fact that they overlap.
We can perform a quick test for frustum visibility of each Octant by testing its BoundingSphere against the Frustum planes.
This should be considered a 'potential visibility' test, because it's not accurate - however, it errs on the side of caution.. it will sometimes return 'false positives'.
For each Node that appears to intersect the Frustum, we perform more accurate tests.. We can QUICKLY test the 8 vertices of the Frustum to discover which Octants they live in.. less quickly, we can test the 8 BoundingBox vertices of a Node against the Frustum planes. Note that I've listed the slowest stuff last.
Without actually spelling it out, can you see how this algorithm is beginning to take shape?
Posted on 2007-02-02 06:59:12 by Homer
It's important that we make the first visibility test (recursively applied to each octree node until failure) as quick and as brutal as possible.
I mentioned that I was considering wrapping each Node with a theoretical boundingsphere. This leads us to investigate and understand the nature of 'SphereTrees', and we learn that we can easily convert an Octree into a SphereTree.

I've modified the Octree generator code.. specifically, the recursive function called BuildOctreeNode.
As we know, this function is handed a set of input Faces which it then attempts to distribute into up to 8 subnodes.
What I've done is add a new function call .. we calculate the BoundingSphere (radius and origin) for the input set of faces BEFORE we distribute them.
This boundingsphere may be larger than the node's theoretical Box (and thus overlap sibling nodes), or it may be smaller, that's totally dependant on the ACTUAL GEOMETRY of the input faces.
I've added fields to the OctNode structure to store the boundingsphere radius and origin information.
The result of these changes is that we are in fact building a SphereTree WITHIN our Octree, at very little cost, and with the tightest possible fit over the geometry.
Every node including the root node has a sphere.

I have yet to modify the customfile save/load code to match.

Our first visibility test will be as follows:
(Walking the Tree from Root down as usual,)
If the current node's boundingsphere radius is larger than the frustum  view depth (ie fFar - fNear), then this node could possibly contain the ENTIRE frustum, so we perform a special test to see if any of the frustum VERTICES are inside the Sphere.. if not, we return from recursion.. otherwise, we need to check each active child node.
If the current node's sphere radius is NOT larger than the frustum view depth, then the frustum could possibly contain the ENTIRE sphere, so we test the sphere against the frustum planes using a modified point test. We only continue recursion if the sphere is partially or totally inside the frustum, which is the whole point of the exercise.
By performing two kinds of testing, we can efficiently handle massive worlds and small dense regions with the same code.. I do assume that the world is a big place, after all, why else would we be performing visibility-based culling of volume hierarchies? :)

I've written all the testing functions I've previously described, expect an update soon :)

If you have questions or ideas you wish to share, please do so!

Posted on 2007-02-03 09:48:51 by Homer
I've changed my mind, and my code.
The following solution offers the tightest possible fit of spheres over the geometry.

The call to generate each node's boundingsphere is now being made AFTER the faces have been sorted into childnodes, and not before.
This means that only nodes which STILL contain faces after the sorting process are the nodes which have a sphere.
Nodes which don't contain geometry don't have a sphere.
These nodes are actually redundant to us !!!
We can trim them out of the Tree :D
I'm not doing that yet, but I MAY do so.

Walking the spheretree for Rendering is done as follows:

-Render any 'remnant faces' in the current Node
-For each active Child of the current Node,
--Test for intersection of Child's VisibilitySphere and the Frustum
--Recurse any child whose Sphere intersects Frustum
-Return from Recursion

Thus, we use the Octree links for Walking, but we use the PolygonSpheres for Visibility Testing.
Almost time to post updated source :)

Posted on 2007-02-04 06:34:53 by Homer
Did you notice I used the term 'HyperBox' to describe the 'inverted boundingbox' of a Node in our Tree?

I say it's inverted because we defined it as a central point with three axial cutting planes passing through it - essentially, each and every one of our Boxes is "infinitely big" - it doesn't have any 'outside'.

The only sense of size is given by the 'stepover' aka the 'halfwidth' value which is passed down the Tree during its creation in order to guide the placement of any Child nodes.

We're using the HyperBoxes to uniformly cut non-empty space into subspaces which may or may not be empty.
Our octree nodes represent a hierarchy of non-empty subspaces, and a map of empty subspaces.
The tight-fitting Spheres (that only exist in nodes that contain 'Renderables') will in most cases more accurately describe the 'Potentially Renderable Volume' owned by a given Node.
Not only does this allow for more 'hungry' culling of our tree, it means  we use a single Sphere/Frustum test rather than an expensive Box/Frustum test (8 * point/frustum tests) to test the visibility at each Node.

Hmmz, more accurate, more efficient AND faster?
I believe I've earned myself a beer :)
Posted on 2007-02-04 07:07:01 by Homer
The next task:

I must walk the generated but unsaved tree.
At each non-empty node, I must create a list of the vertices referenced by its faces, and then edit those faces to use the new vertices.
Remember that I switched from 16bit indices to 32bit indices/pointers some time ago, in order to break the 64k vertex limit?
We're splitting up our one (possibly huge) input vertexbuffer into lots of small vertexbuffers. This will create some duplicate vertices, but the benefit is huge:
it allows us to use 16 bit indices for our vertices even when we have more vertices in total than would fit in 16 bits.
That means a saving both in diskspace and in memory of almost 50%.

It will require additional changes to the 'load' and 'save' functions.
Vertices will now be in a form suitable for loading (on a per-Node basis) DIRECTLY into D3D vertexbuffers for Rendering (rather than heapmem etc).. all we need is an indexbuffer constructed from the face indices at load-time.. perhaps if we saved per-node faces in a different format than we do currently.. so we could load all those indices in a chunk directly into a D3D indexbuffer also.. heh.

Lots of pros, no obvious cons, lets do it :)

Posted on 2007-02-04 07:22:49 by Homer
I've completed all the proposed changes in Project A.

When BuildOctreeNode finishes processing a Node, and that Node contains 1 or more Faces, I build a list of the vertices referenced by that faceset and store it in that Node.. while doing so, I replace the vertexpointers in those faces with indices into the new vertexlist.
The code for Saving the tree has been modified to suit.
Nodes which contain Faces have their vertices saved as a contiguous chunk of data, and their face indices are similarly saved as a contiguous block of 16-bit indices local to that node.
That'll make life easier in Project B, since we can load each node's renderables directly into D3D buffers as previously mentioned.

I am disappointed to see that since the gamedev threads were merged into the Main category that my posts are receiving almost zero views.. I believe this is because nobody can find them :(

I'll post an update in the next day or two for ProjectA, and then turn my attention back to Project B, where we'll be making changes to the Loader code im preparation for our new 'octospherical frustum culling'.

Gee, that sounds awful, any suggestions for a better name?
Posted on 2007-02-04 23:30:33 by Homer