Several years ago when I first started messing around with D3D code, a member of this forum tried to explain to me how it was possible to eliminate texture-thrashing and simultaneously pre-sort renderable geometry into opaque and translucent groups.
I'd like to credit Jan for this idea.
At the time I didn't really grasp how this should be implemented.
I figured he was inferring that we should hijack certain api calls and redirect them to our own code.
That's certainly one possibility, but there's a thousand ways to skin a cat. The following thesis is certainly no slower than API hijacking and a lot easier to implement.

Idea : Deferred Rendering
- Eliminates 'texture thrashing'
- Reduces changes to the RenderStates

A Renderable is something that we can draw.

Our mission, if we choose to accept it, is to sort Renderables.
Primarily they are to be sorted into two categories (textured and untextured).
Secondly they are to be sorted into two subcategories (opaques and translucents).

We wish to collect data for Renderables by providing
our own pseudo-versions of D3D's DrawWhatever api functions,
only actually performing rendering via our own pseudo-version
of the Present api.
Note that our pseudo-Render functions don't actually do any Rendering, they are simply sorting and collecting data normally associated with calls to D3D rendering functions.
Actual rendering doesn't happen until the user calls (our) Present method.

I'm only going to talk about Textured entities, you can assume that the handling of Untextured entities is very similar.

We'll need to keep an eye on some of the render states,
especially alpha blending enable/disable (since we can
use that switch to determine in which Collection to store
the current Renderable).

Each Renderable will require its own FVF data, as well as
any Parameters supplied in the original call.

It might be astute to extend our existing Texture objects
as Managers for collections of Textured Renderables.

Object Texture,TextureID,Primer
RedefineMethod Done
DefineVariable pName,Pointer,NULL
DefineVariable pTexture,Pointer,NULL
DefineVariable pRenderables_Opaque,Pointer,NULL
DefineVariable pRenderables_Translucent,Pointer,NULL

When the user issues the call to Present the backbuffer,
we perform the ACTUAL rendering of our Texture groups in two passes.
First we render opaques, then we render translucents.
After that's been done, we flush the collections and
make a call to the ACTUAL Present api function.
Code can be added to the TextureManager class to perform such chores.
In fact, it might be best to write a new RenderManager class which
inherits from TextureManager.

I would very much appreciate any input in regards to this proposal.
Posted on 2006-07-13 12:35:22 by Homer
Hmmm, "Jan" - wasn't that one of Scali's accounts after he had been banned? :)
Posted on 2006-07-13 15:43:35 by f0dder
Heh, yeah, that's him.
Not a wonderful example of a human being perhaps, but quite switched on.
Credit where it's due.
Posted on 2006-07-13 17:27:44 by Homer
what about this simple code?
could be extended to use multitexture and jumptable if lot more advanced, with different trans/opaque and other settings
different VB's also
textureLUT, contains a bunch of texturehandles you usually use in settexture apicall

        mov ebx,0
        mov eax,
        m2m maxmeshes,
      m2m lengthof,
        pcall lpd3dDevice.SetTexture, 0, eax
        mov esi,0
        pcall lpd3dDevice.DrawPrimitive, D3DPT_TRIANGLESTRIP, esi,lengthof
        add esi,lengthof
        .IF esi<maxmeshes
        jmp @@rl1
        add ebx,1
        .IF ebx<maxtextures
        jmp @@rl2

Posted on 2006-07-28 07:40:14 by daydreamer
Thats quite similar to the code I have implemented in the demo project at this time - its naive, and it does not attempt to pre-sort materials by alpha... however , it DOES provide rendering of all Instances of a given Mesh on a per-Material basis, which does ALMOST eliminate texture-thrashing in the general case.
If you neglect to pre-sort Materials, you are only fooling yourself.
My tutorial will cover pre-sorting in the very near future.
Posted on 2006-07-28 08:05:03 by Homer

Thats quite similar to the code I have implemented in the demo project at this time - its naive, and it does not attempt to pre-sort materials by alpha... however , it DOES provide rendering of all Instances of a given Mesh on a per-Material basis, which does ALMOST eliminate texture-thrashing in the general case.
If you neglect to pre-sort Materials, you are only fooling yourself.
My tutorial will cover pre-sorting in the very near future.

terrain where you decided priority with layer seafloor,sea,sand,grass,stone,snow, it doesnt need to be presorted anyway
wonder if not if too many different trianglestrips for one texture, it would be better to convert to trianglelist and a single draw trianglelist for each material, for smaller textures in the distance , merge them to a bigger one, and a single draw trianglelist for all distant objects/terrains for many textures, as far away everything can be shrinked into maxtexturesize
I think gpu works lot better with single apicalls that render many polys

question on vertexbuffers, cant I have many different, instead of keep track of many different vertexes/different objects are kept in one single one? or its costly to switch to a new one?

Posted on 2006-07-28 15:55:57 by daydreamer
An Ordering Table (OT)  provides best speed of sorting polygons by their average Z. Introduced in PlayStation1, you could easily sort ~30,000 (visible) poly, 60fps on the 30MHz RISC cpu. In PSX, there's no Z-buffer, and during rendering you actually put polygons' data (variable-sized structures) into a big dump-array, and finally send data to the GPU out from this dump-array. 
The basics of OT is having an array of linked-lists of polygons. Usually 2000 linked-lists.

The idea on OT can be used in eliminating texture-trashing, too.

Here's an excerpt of a software 3D engine of mine, using OT:

U16 OT[2048];
U8* OT_Packets; // = malloc(200000);
U16 OT_NextPacket = 0;

//=========[ structures ]============[
typedef struct{
U16 nextPrimitive;
U8 typePrimitive;
U8 textureID;
S16 x0;
S16 y0;
S16 x1;
S16 y1;
S16 x2;
S16 y2;
}Primitive; // 16 bytes
typedef struct{
Primitive P;
U16 color;
U16 padding;

typedef struct{
Primitive P;
U16 u0,v0;
U16 u1,v1;
U16 u2,v2;
U8 lightness;
U8 padding[3];

typedef struct{
Primitive P;
U8 R[3],G[3],B[3];

void Start_3D2_Scene(){
OT_NextPacket = 4;

void End_3D2_Scene(){
long CurZ;
U16 NextPacketIndex;
Primitive* pPrim;
if(OT_NextPacket==4)return; // no packets added
if(!(NextPacketIndex = OT))continue;
pPrim = (Primitive*)&OT_Packets;
NextPacketIndex = pPrim->nextPrimitive;
//----- draw the primitive ]-------------------[
}else if(pPrim->typePrimitive==2){
}else if(pPrim->typePrimitive==3){
}else if(pPrim->typePrimitive==4){

::below is an excerpt from a DrawMesh() proc, it executes the following code for each visible polygon:

//----[ set-up basic packet ]------------------------[
CurrentZ = (a->z + b->z + c->z) / 4;
CurPacket = (Primitive*) &OT_Packets;
CurPacket->textureID = CurTextureID;
CurPacket->x0 = (S16)a->x;
CurPacket->y0 = (S16)a->y;
CurPacket->x1 = (S16)b->x;
CurPacket->y1 = (S16)b->y;
CurPacket->x2 = (S16)c->x;
CurPacket->y2 = (S16)c->y;
CurPacket->typePrimitive = 0;
CurPacket->nextPrimitive = OT;

if(f->flags & FACE_TEXTURED){
PrimTextured* CurTexPacket;
CurPacket->typePrimitive = 1;
CurTexPacket = (PrimTextured*)CurPacket;
CurTexPacket->lightness = CachedNormalsZ;
CurTexPacket->u0 = f->u0;
CurTexPacket->v0 = f->v0;
CurTexPacket->u1 = f->u1;
CurTexPacket->v1 = f->v1;
CurTexPacket->u2 = f->u2;
CurTexPacket->v2 = f->v2;
OT = OT_NextPacket;
OT_NextPacket+= sizeof(PrimTextured);
}else if(f->flags & FACE_ALPHA){
CurPacket->typePrimitive=4; // flat color, alpha
((PrimFlat*)CurPacket)->color = LitTexels[(CachedNormalsZ<<8) + f->color];
OT = OT_NextPacket;
OT_NextPacket+= sizeof(PrimFlat);
CurPacket->typePrimitive=3; // flat color
((PrimFlat*)CurPacket)->color = LitTexels[(CachedNormalsZ<<8) + f->color];
OT = OT_NextPacket;
OT_NextPacket+= sizeof(PrimFlat);

Posted on 2006-07-28 19:07:08 by Ultrano
OT sounds nice if its made to sort texturewise also and use zbuffer instead
why not final output from 3dtransform proc, expands all small trianglestrips to trianglelist and perform a single settexture/drawprimiteTRIANGLELIST

many smaller textures and downsized textures for distant rendering is used instead of some mip/map levels and merged into bigger textures, while UVcoordinates are changed accordingly
all objects behind a certain distance/certain size onscreen are rendered without texturechange, with help of a merged texture

this can be done dynamically when init of game sense maxtexturesize/install of game

an object which is a spaceship with a forcefield around it, could have a single texture with alphachannel and sorted all meshes to be rendered in the right order
Posted on 2006-07-30 10:26:17 by daydreamer
Homers very textureswitching terraindemo on different hardware
Posted on 2006-07-30 11:10:39 by daydreamer
same cpu etc, newer gpu
Posted on 2006-07-30 11:25:30 by daydreamer
Ah yes, that was one of my very first d3d demos.
Theres 256 textures, but theres no texture thrashing.
Each texture is set once.

There's a lot better ways to achieve that kind of blending, but that demo was designed to work on the most pathetic and featureless cards (which is what I had when I wrote it, and what I've been forced back to recently).

Generating those textures at runtime from four input textures and the heightmap - that was the interesting part.
I was pretty happy with it at the time :)
Posted on 2006-08-02 03:51:25 by Homer