You can see that its not too difficult to write code to produce the Triangle Tree(s) as described, and to decide when we've gone 'deep enough' and thus reached a Leaf Node (ran out of heightmap pixel resolution, or reached our prescribed subpixel limit).

Now lets talk about VARIANCE.

For any triangle, let's sample the height at the Hypotenuse MidPoint (half way along Edge AB), and compare it to the height obtained by INTERPOLATION of Points A and B (with weight of 0.5f).... its the same point in XZ, but the Y values will (usually) be different.

I define Variance as the difference between the Sampled and Interpolated height of the H-MidPoint - its a measure of 'height discrepancy', it's an 'error term'.

Triangles that are over relatively 'flat' regions will have very low Variance value (including regular sloped regions).

Triangles that are over 'hilly' regions will have much higher Variance.

While generating our Tree(s), when we reach a LEAF NODE, we calculate the Variance for that Triangle, then return from our recursion... and as we return to each Parent Node, we simply sum the Variance of its two Children.

This makes sense, since the area of each Parent triangle is twice that of each of its Children.

So we only Calculate the variance at the Leaves (smallest triangles), and then recursively add those values so that larger triangles have larger variances that are based on the variance of the triangles they contain.

Now that all happens just once, as we build our Tree(s).

When it comes to rendering, we walk our tree(s), looking at the Variance.

For a given node, if the Variance is small enough, we want to quit walking the tree and just draw that node's triangle... but if the Variance is large enough, we will continue down until we reach a Leaf - if we reach a Leaf, we ALWAYS want to draw it.

So large variances cause us to travel further down the tree, and draw smaller triangles.

And small variances cause us to quit early, drawing larger, flatter triangles.

The only thing I've not discussed is how the Camera View affects this process.

Triangles that are far away from the camera are less important, so we can accept them as they are and not recurse them (split them) as we want more detail in the close-up stuff and less in the distant stuff.

Basically I just multiply the Variance of a triangle by its distance from the camera, and compare the result to a constant 'C' ... if the result is less than C, I'll quit recursion and accept the current Node's triangle for rendering.

And if the result is MORE than C, I'll keep recursing until I find an acceptable triangle, or until I reach a Leaf.

so we're using the camera distance to scale the variance - if a triangle is relatively flat (low variance), then we'll need to get a lot closer to it before it will Split... if a triangle is steep (high variance), then splitting will occur from a greater viewing distance.

Am I making any sense?

Now lets talk about VARIANCE.

For any triangle, let's sample the height at the Hypotenuse MidPoint (half way along Edge AB), and compare it to the height obtained by INTERPOLATION of Points A and B (with weight of 0.5f).... its the same point in XZ, but the Y values will (usually) be different.

I define Variance as the difference between the Sampled and Interpolated height of the H-MidPoint - its a measure of 'height discrepancy', it's an 'error term'.

Triangles that are over relatively 'flat' regions will have very low Variance value (including regular sloped regions).

Triangles that are over 'hilly' regions will have much higher Variance.

While generating our Tree(s), when we reach a LEAF NODE, we calculate the Variance for that Triangle, then return from our recursion... and as we return to each Parent Node, we simply sum the Variance of its two Children.

This makes sense, since the area of each Parent triangle is twice that of each of its Children.

So we only Calculate the variance at the Leaves (smallest triangles), and then recursively add those values so that larger triangles have larger variances that are based on the variance of the triangles they contain.

Now that all happens just once, as we build our Tree(s).

When it comes to rendering, we walk our tree(s), looking at the Variance.

For a given node, if the Variance is small enough, we want to quit walking the tree and just draw that node's triangle... but if the Variance is large enough, we will continue down until we reach a Leaf - if we reach a Leaf, we ALWAYS want to draw it.

So large variances cause us to travel further down the tree, and draw smaller triangles.

And small variances cause us to quit early, drawing larger, flatter triangles.

The only thing I've not discussed is how the Camera View affects this process.

Triangles that are far away from the camera are less important, so we can accept them as they are and not recurse them (split them) as we want more detail in the close-up stuff and less in the distant stuff.

Basically I just multiply the Variance of a triangle by its distance from the camera, and compare the result to a constant 'C' ... if the result is less than C, I'll quit recursion and accept the current Node's triangle for rendering.

And if the result is MORE than C, I'll keep recursing until I find an acceptable triangle, or until I reach a Leaf.

so we're using the camera distance to scale the variance - if a triangle is relatively flat (low variance), then we'll need to get a lot closer to it before it will Split... if a triangle is steep (high variance), then splitting will occur from a greater viewing distance.

Am I making any sense?

I'm uploading an avi movie to show this scheme in action, you can see that the more flat areas don't want to split until we get REALLY close, and the lumpy regions will split more easily at greater distances.

Note that we're not actually splitting anything, we already have the tree at full resolution, we're merely terminating recursion early to produce larger triangles.

http://stig.servehttp.com/homer/DLOD.avi

Note that we're not actually splitting anything, we already have the tree at full resolution, we're merely terminating recursion early to produce larger triangles.

http://stig.servehttp.com/homer/DLOD.avi

Managed to grab ~10 megs of that AVI, then the server stopped responding :(

its 12 meg - perhaps I was still uploading it ? its there now

its 12 meg - perhaps I was still uploading it ? its there now

I'm not sure what the deal was there, I did post before the upload was completed - it certainly did complete, but I know that the host was definitely down when I checked a moment ago :|

Although that short movie doesn't really show it clearly, the existing code produces 'T junctions' which in turn cause 'cracks' to appear in the surface of the terrain.

Some years ago, a fellow named Seumus McNally (RIP) published an algorithm called 'ROAM' whose major aim was to address exactly this problem. Later, it was extended by other authors, note that I refer only to the original 'Split-Only ROAM' algorithm in this post.

But before I talk about that, I'll have to describe exactly how I'm splitting my triangles.

I decided that I wanted to be able to use 'backface culling', which means all the triangles need to be constructed in the same direction. So when I create the two Root triangles (which form our most coarse level of tesselation, forming a single textured quad), I was careful to select the vertices for each root triangle such that A) the winding order is the same and B) the Hypotenuse is always described first (AB is hypotenuse of triangle ABC).

This allowed me to create a single function with similar simple rules to split the root triangles into two in a consistant way... given a triangle ABC where AB is the hypotenuse, and calculating a point O along the midpoint of edge AB, our child triangles are CAO and BCO... these child triangles have the correct winding order and hypotenuse as first edge.

I constructed my Left root triangle as:

A=0,height-1

B=width-1,0

C=0,0

I constructed my Right root triangle as:

A=width-1,0

B=0,height-1

C=width-1,height-1

where width and height are the dimensions of the Heightmap image.

I construct my triangles in Heightmap Space.

But the function that creates Vertices scales them up to World Space (Position), and down to Texture Space (UV).

So don't worry that the coordinates that I'm using don't sound like much of a large world.

Some years ago, a fellow named Seumus McNally (RIP) published an algorithm called 'ROAM' whose major aim was to address exactly this problem. Later, it was extended by other authors, note that I refer only to the original 'Split-Only ROAM' algorithm in this post.

But before I talk about that, I'll have to describe exactly how I'm splitting my triangles.

I decided that I wanted to be able to use 'backface culling', which means all the triangles need to be constructed in the same direction. So when I create the two Root triangles (which form our most coarse level of tesselation, forming a single textured quad), I was careful to select the vertices for each root triangle such that A) the winding order is the same and B) the Hypotenuse is always described first (AB is hypotenuse of triangle ABC).

This allowed me to create a single function with similar simple rules to split the root triangles into two in a consistant way... given a triangle ABC where AB is the hypotenuse, and calculating a point O along the midpoint of edge AB, our child triangles are CAO and BCO... these child triangles have the correct winding order and hypotenuse as first edge.

I constructed my Left root triangle as:

A=0,height-1

B=width-1,0

C=0,0

I constructed my Right root triangle as:

A=width-1,0

B=0,height-1

C=width-1,height-1

where width and height are the dimensions of the Heightmap image.

I construct my triangles in Heightmap Space.

But the function that creates Vertices scales them up to World Space (Position), and down to Texture Space (UV).

So don't worry that the coordinates that I'm using don't sound like much of a large world.

Gah, forget it, no offence to the late Seumus McNally, but I'm totally over this whole ROAM thing.

The amount of complexity and extra recursions involved in 'repairing T junctions to eliminate cracks' is just silly.

And the binary triangle tree? Forget it - I think in the late 90s everyone was just going silly with binary trees since Quake was such a big deal, everyone wanted to apply BSP-like algorithms, and especially wanted to apply it to outdoor scenes, which were at the time seen as 'difficult' (since bsp wasn't suitable).

Two years ago I used QuadTrees to create a 3D realtime terrain painting demo, and I can tell you that:

-QuadTrees produce a more compact, less deep tree structure

-QuadTrees are faster to recurse as a consequence

-QuadTrees produce NO T JUNCTIONS in the first place, so there's no cost to 'eliminate' them.

So - they're less complex, they're more memory efficient, they're faster, and they don't suffer from cracks.

Well, at least I can say I try everything once... sigh, it's typical that the first solution I came up with was superior.

I'll go back and rewrite the affected code, and hopefully make a few improvements over my earlier work.

The amount of complexity and extra recursions involved in 'repairing T junctions to eliminate cracks' is just silly.

And the binary triangle tree? Forget it - I think in the late 90s everyone was just going silly with binary trees since Quake was such a big deal, everyone wanted to apply BSP-like algorithms, and especially wanted to apply it to outdoor scenes, which were at the time seen as 'difficult' (since bsp wasn't suitable).

Two years ago I used QuadTrees to create a 3D realtime terrain painting demo, and I can tell you that:

-QuadTrees produce a more compact, less deep tree structure

-QuadTrees are faster to recurse as a consequence

-QuadTrees produce NO T JUNCTIONS in the first place, so there's no cost to 'eliminate' them.

So - they're less complex, they're more memory efficient, they're faster, and they don't suffer from cracks.

Well, at least I can say I try everything once... sigh, it's typical that the first solution I came up with was superior.

I'll go back and rewrite the affected code, and hopefully make a few improvements over my earlier work.

Work on the new improved QuadTree-based DLOD terrain is slow but steady.

Today I realized that I can elimate the deepest level of my tree entirely, saving memory and cpu time.

This is as simple as changing our definition of a leaf node...

FROM : "If this Node has no Children, then it is a Leaf Node"

TO : "If this Child Pointer is NULL, then it is a Leaf Quadrant"

Essentially, we don't need leaf nodes in the conventional sense - the geometry which would have been emitted by a leaf node can just as easily be generated by its parent on a per-child basis.

Although this won't work for EVERY kind of tree structure in the universe, it DOES work in this case, since we build our tree from the root down, each Node knows a lot about its theoretical children.

I just thought this simple idea was so cool I had to post about it.

Today I realized that I can elimate the deepest level of my tree entirely, saving memory and cpu time.

This is as simple as changing our definition of a leaf node...

FROM : "If this Node has no Children, then it is a Leaf Node"

TO : "If this Child Pointer is NULL, then it is a Leaf Quadrant"

Essentially, we don't need leaf nodes in the conventional sense - the geometry which would have been emitted by a leaf node can just as easily be generated by its parent on a per-child basis.

Although this won't work for EVERY kind of tree structure in the universe, it DOES work in this case, since we build our tree from the root down, each Node knows a lot about its theoretical children.

I just thought this simple idea was so cool I had to post about it.

Spent the last couple of days trying to come up with a more efficient way of finding 'neighbors' of a given quadtree node, for arbitarily-constructed trees... not so much success there, but after experimenting with variations on numbering systems for a while, I realized that an ordered numbering system was as simple as defining a linear equation which takes into account the two axes of the quadtree coordinate system as well as the Level of Density (which is simply the level of recursion depth of a given set of nodes in the tree).

Here's an image which illustrates the scheme I have in mind.. it says nothing about the implementation, which is a great place to start.

I've implemented it, heh.

I've implemented it, heh.

Biterider has been doing some good work on ObjAsm32's RadAsm plugin... its now stable, and has more features.

I've been working more on the Terrain stuff, last night I implemented view frustum culling on a PER TRIANGLE basis - and to my disbelief, doubled my framerate !! :O

Now I'm implementing the 'T-Junction elimination', which introduces two passes to the realtime Tesselation.

Pass 1 : recurse the quadtree, determining which level of density to use for each node, while activating and deactivating the 'bounding points' of each node and the neighbours of each node.

Pass 2 : recurse the quadtree, emitting triangles for rendering, guided by the active bounding points of each node.

I really, REALLY didnt want to recurse the tree twice but I simply don't see a viable alternative, short of using some kind of queue that would add complexity and use more memory and introduce MORE cache misses, so I'm trying to limit the recursion to the nodes which we actually care about.

Anyway, I'll explain this in more detail once I'm happy with the implementation... no point in discussing 'organic code' as I'm very likely to change things.

I've been working more on the Terrain stuff, last night I implemented view frustum culling on a PER TRIANGLE basis - and to my disbelief, doubled my framerate !! :O

Now I'm implementing the 'T-Junction elimination', which introduces two passes to the realtime Tesselation.

Pass 1 : recurse the quadtree, determining which level of density to use for each node, while activating and deactivating the 'bounding points' of each node and the neighbours of each node.

Pass 2 : recurse the quadtree, emitting triangles for rendering, guided by the active bounding points of each node.

I really, REALLY didnt want to recurse the tree twice but I simply don't see a viable alternative, short of using some kind of queue that would add complexity and use more memory and introduce MORE cache misses, so I'm trying to limit the recursion to the nodes which we actually care about.

Anyway, I'll explain this in more detail once I'm happy with the implementation... no point in discussing 'organic code' as I'm very likely to change things.

The attached image describes how tesselation is guided by the 'active bounding points' of a given Node... if we decide to recurse a QUADRANT of a Node, we DISABLE the associated CORNER point, which prevents tesselation in that Quadrant of the Node.

I've had a fair bit of trouble with this project, but last night I found two major problems:

#1 - the function I use to determine a sane Identifier for each cell was misbehaving.

#2 - the function I use to find the neighbours of a given cell (based on the ID) was also misbehaving.

Either one of these would explain the unexpected results I was seeing, but together, there was no obvious pattern, so I'd been chasing around in the wrong part of the code looking for a problem that wasn't there.

How many times can you slap yourself in the forehead before the welt becomes permanent? :lol:

Hopefully now I'll be able to wrap this thing up and move on, it's already taken longer than I expected.

On the bright side, I'm seeing an average 400% improvement in framerate apon my earlier (2 year old) design.

I finally completed the terrain engine, and I'm not happy with the results.

For relatively small worlds, the framerates are impressive, but for 'massive' worlds, the framerate plummets, mostly due to the number of cache misses introduced by the deeper quadtree.

To put it plainly, it's not good enough.

I'm only emitting several thousand triangles to the screen, I expect my framerate to be a lot higher.

Perhaps I expect too much, perhaps I'm not being realistic, but I have a good video card which should easily be able to produce several MILLION triangles per second.

So I'm going to try an idea that I wrote about here a long, long time ago.

My idea is to find the 2D projection of the camera view frustum apon the heightmap, and then tesselate a series of triangle strips by sampling a grid that is deformed to fit the frustum's 2D projection.

Although the following description isn't exactly accurate, it's easier to understand:

Our camera frustum's 2D projection is a deformed square.

We can quickly generate a tesselation within that deformed square, representing ONLY the portion of terrain which is currently visible.

Advantages of this technique include the following:

-Theres NO TREE - so we can use MASSIVE heightmaps 'for free'.

-Frames always take exactly the same amount of time to generate.

-By using bilinear filtering while sampling the heightmap, there are no 'popping effects'.

-Since theres no popping, we don't need to Geomorph, so there's no 'morph sliding effects'.

-We can use a static Index Buffer.

-There is no Paging required unless the heighmap exceeds 4096x4096 pixels.

Disadvantages include:

-We must refill a (small) Vertex Buffer every time the view changes.

For relatively small worlds, the framerates are impressive, but for 'massive' worlds, the framerate plummets, mostly due to the number of cache misses introduced by the deeper quadtree.

To put it plainly, it's not good enough.

I'm only emitting several thousand triangles to the screen, I expect my framerate to be a lot higher.

Perhaps I expect too much, perhaps I'm not being realistic, but I have a good video card which should easily be able to produce several MILLION triangles per second.

So I'm going to try an idea that I wrote about here a long, long time ago.

My idea is to find the 2D projection of the camera view frustum apon the heightmap, and then tesselate a series of triangle strips by sampling a grid that is deformed to fit the frustum's 2D projection.

Although the following description isn't exactly accurate, it's easier to understand:

Our camera frustum's 2D projection is a deformed square.

We can quickly generate a tesselation within that deformed square, representing ONLY the portion of terrain which is currently visible.

Advantages of this technique include the following:

-Theres NO TREE - so we can use MASSIVE heightmaps 'for free'.

-Frames always take exactly the same amount of time to generate.

-By using bilinear filtering while sampling the heightmap, there are no 'popping effects'.

-Since theres no popping, we don't need to Geomorph, so there's no 'morph sliding effects'.

-We can use a static Index Buffer.

-There is no Paging required unless the heighmap exceeds 4096x4096 pixels.

Disadvantages include:

-We must refill a (small) Vertex Buffer every time the view changes.

From my experience, it's much better to buffer ALL possible vertices in a vertex buffer and then schedule the rendering of the relevant (read: visible) ones, instead of refilling the buffer in each frame.

As for the 'framerate' problem: GFX accelerators are limited by (among other things) their fillrate so even with small number of triangles you can't get zillions of FPS. That's why 'FPS performace' should be tested with some decent amount of geometry. Otherwise you're just testing your triangle culling mechanisms and card's fillrate.

BTW: When do we get some demo? :)

As for the 'framerate' problem: GFX accelerators are limited by (among other things) their fillrate so even with small number of triangles you can't get zillions of FPS. That's why 'FPS performace' should be tested with some decent amount of geometry. Otherwise you're just testing your triangle culling mechanisms and card's fillrate.

BTW: When do we get some demo? :)

For a small chunk of terrain, that is perfectly reasonable... 64x64 pixels worth of heightmap? no problems.

However when we start dealing with large-scale visualization, it's simply not practical to buffer every vertex - and paging systems are really not a solution, because they amount to a partitioning scheme - in the end we are back to having a tree with n levels where each describes a level of detail - and then the cost of determining which vertex indices to emit to the pipeline for a given frame becomes exponentially expensive - so its really no better than a quadtree or any other tree scheme - but suffers from the extra cost of having to 'stitch' the patch boundaries.

So instead of using a large static array of vertices and a small dynamic array of indices where the expensive part is discovering which indices to use, I'm thinking of sparsely sampling the heightmap data, and writing out a relatively small number of dynamic vertices, while using a small static index buffer to render them.

I'm telling you that my videocard was achieving 1700 fps using simple bruteforce rendering of the quadtree's finest level of detail, as compared to 700 to 1200 fps when using split-only ROAM tesselation, and I'm stating that the bottleneck is cpu cache misses which increase exponentially with tree complexity (depth)... mostly this is because the Nodes in a tree are not linear in memory, something which can only ever be partically worked around given that our recursion or 'walk' of said tree will typically not be linear either (we don't walk every node in order, thats the whole point to such partitioning schemes - and stitching / crack elimination schemes require we visit 'neighbours' which most certainly introduces more of the same).

Modern videocards are making bruteforce look viable - the scheme I'm describing (sparse dynamic heightmap oversampling) uses no tree, memory accesses are linear, and emitted geometry is 'stripified'.

Provided I can shovel triangles to the gpu fast enough, I just eliminated the most expensive part of the algorithm, while eliminating a couple of visual artefacts, and taking the load off the cpu and system memory.

It might not sound elegant, in some ways it can be seen as a variation on the simple bruteforce approach, but when you think about it, theres enough benefits to at least make it worthy of a second glance.

However when we start dealing with large-scale visualization, it's simply not practical to buffer every vertex - and paging systems are really not a solution, because they amount to a partitioning scheme - in the end we are back to having a tree with n levels where each describes a level of detail - and then the cost of determining which vertex indices to emit to the pipeline for a given frame becomes exponentially expensive - so its really no better than a quadtree or any other tree scheme - but suffers from the extra cost of having to 'stitch' the patch boundaries.

So instead of using a large static array of vertices and a small dynamic array of indices where the expensive part is discovering which indices to use, I'm thinking of sparsely sampling the heightmap data, and writing out a relatively small number of dynamic vertices, while using a small static index buffer to render them.

I'm telling you that my videocard was achieving 1700 fps using simple bruteforce rendering of the quadtree's finest level of detail, as compared to 700 to 1200 fps when using split-only ROAM tesselation, and I'm stating that the bottleneck is cpu cache misses which increase exponentially with tree complexity (depth)... mostly this is because the Nodes in a tree are not linear in memory, something which can only ever be partically worked around given that our recursion or 'walk' of said tree will typically not be linear either (we don't walk every node in order, thats the whole point to such partitioning schemes - and stitching / crack elimination schemes require we visit 'neighbours' which most certainly introduces more of the same).

Modern videocards are making bruteforce look viable - the scheme I'm describing (sparse dynamic heightmap oversampling) uses no tree, memory accesses are linear, and emitted geometry is 'stripified'.

Provided I can shovel triangles to the gpu fast enough, I just eliminated the most expensive part of the algorithm, while eliminating a couple of visual artefacts, and taking the load off the cpu and system memory.

It might not sound elegant, in some ways it can be seen as a variation on the simple bruteforce approach, but when you think about it, theres enough benefits to at least make it worthy of a second glance.

Originally I was planning on using a rasterization technique to identify which HeightMap Pixels are inside the area defined by the XZ projection of the view frustum, and then tesselate between them.

I've realized that won't produce the effect I desire, it will look a little rough - the real solution I want is to define a uniform grid of heightmap selection points within the 'unprojected' (no perspective) 2D projection of the frustum (which is a regular rectangle) - then with the camera facing into a default direction, project these points ONCE to deform them for perspective, setting up a static set of indices that define a stripified tesselation, and then at runtime, rotate and translate this 'template' array of points with the camera before resampling the height data and spitting out appropriate vertices. I think with some care, this can be done almost exclusively within the gpu using a vertex shader :D Theres a trick whereby we can pass an array to a VS, and I think this might be the technique I'll look at.

I've realized that won't produce the effect I desire, it will look a little rough - the real solution I want is to define a uniform grid of heightmap selection points within the 'unprojected' (no perspective) 2D projection of the frustum (which is a regular rectangle) - then with the camera facing into a default direction, project these points ONCE to deform them for perspective, setting up a static set of indices that define a stripified tesselation, and then at runtime, rotate and translate this 'template' array of points with the camera before resampling the height data and spitting out appropriate vertices. I think with some care, this can be done almost exclusively within the gpu using a vertex shader :D Theres a trick whereby we can pass an array to a VS, and I think this might be the technique I'll look at.

Oh, you asked for a demo - of what? code I'm not happy with? You still wanna see that?

There's an update to the ObjAsm32 package in the works, I'll be submitting a DirectX demo whose sourcecode provides support for instancing of animated skinmeshes, and perhaps of more interest, the ability to attach instances of static meshes to instances of skinmeshes in a way that allows each model to have unique accessories (weapons and armour ? want to mount a machinegun on your camaro ?).

I'd also like to submit my networking engine but I am not comfortable with its current state of development, I think it needs a little more polish and some thorough testing.

I'd also like to submit my networking engine but I am not comfortable with its current state of development, I think it needs a little more polish and some thorough testing.